Building a Google Docs Text to Speech (TTS) Agent
How to turn Google Docs into speech.
Disclaimers:
- At the time of this writing, I am employed by Google Cloud. However the thoughts expressed here are my own and do not represent my employer.
- The code provided here is sample code for educational purposes only. Please write your own production code.
Introduction
Marketing professionals are constantly creating content, from website copy and email campaigns to video scripts and social media posts. But what if you could easily convert that written content into audio? Imagine being able to create audio versions of your blog posts for wider accessibility, generate voiceovers for your promotional videos without hiring voice actors, or even proof-listen to your ad copy for tone and impact. This is where a Text-to-Speech (TTS) solution integrated with your content creation workflow can be a game-changer.
Today, we're going to build a powerful Python tool that does exactly that. We'll create a script that securely connects to the Google Docs API, extracts the text from any document, and then uses Google Cloud's high-fidelity Text-to-Speech (TTS) API—specifically the new Chirp HD voices—to generate a high-quality audio file.
What We're Building
The end goal is a Python script that you can run from your terminal, providing it with a Google Doc URL. The script will output a link to an audio file (.wav) of your document's content, ready for you to listen to.
Here’s a high-level look at the architecture:
Prerequisites
Before we dive into the code, make sure you have the following set up. This is the most crucial part of the setup process!
| Item | Description | Purpose | 
|---|---|---|
| Google Cloud Account | An active GCP account with billing enabled. | Required for using Secret Manager, Docs API, TTS API, and Cloud Storage. | 
| A Google Doc | A document you want to convert to speech. | The source material for our script. | 
| Python Environment | Python 3.11+ with pipinstalled. | To run our script and install libraries. You can use Google Cloud Shell Editor if you do not have a local Python environment. | 
| Google Cloud SDK | The gcloudCLI tool installed and authenticated. | To interact with GCP services from your terminal. | 
You'll also need to install the necessary Python libraries:
pip install google-api-python-client google-auth-httplib2 google-auth-oauthlib google-cloud-secret-manager google-cloud-texttospeech
Step 1: Creating and Securing a Service Account
To allow our script to interact with Google Cloud services on our behalf, we need to create a Service Account. This is like a robot user with specific, limited permissions.
- 
Enable the APIs: Make sure the following APIs are enabled in your Google Cloud project: 
- 
Create the Service Account: - In the Google Cloud Console, navigate to IAM & Admin > Service Accounts.
- Click + CREATE SERVICE ACCOUNT.
- Give it a name (e.g., doc-to-speech-service) and a description.
- Click Done.
 
- 
Create and Download a Key: - Find your newly created service account in the list, click the three-dot menu under "Actions", and select Manage keys.
- Click ADD KEY > Create new key.
- Choose JSON and click CREATE. A JSON file will be downloaded to your computer. Treat this file like a password! Do not commit it to Git.
 
- 
Store the Key in Secret Manager: - Navigate to Security > Secret Manager in the Cloud Console.
- Click + CREATE SECRET.
- Give the secret a name, for example, doc-to-speech-credentials. Copy this name, as we will need it for an environment variable later.
- Under "Secret value", upload the JSON key file you just downloaded.
- Click Create secret.
 
- 
Share your Google Doc: Finally, take the client_emailfrom the downloaded JSON key file and share your Google Doc with that email address, giving it "Viewer" access.
Step 2: Building the Google Docs Client (google_doc.py)
Now for the fun part: the code! We'll split our logic into two main classes: GoogleDocsService to handle authentication and API connections, and GoogleDoc to represent the document itself.
GoogleDocsService Class
This class is our gateway to Google's APIs. It fetches our secure credentials from Secret Manager and initializes the Docs API client.
import os
import json
import google.auth
from google.cloud import secretmanager
from googleapiclient.discovery import build
from google.oauth2 import service_account
class GoogleDocsService:
    """Handles authentication and service building for Google Docs API."""
    def __init__(self, **kwargs):
        self.credentials, self.project_id = google.auth.default()
        self._service_account = None
        self._docs = None
        self.scopes = ['https://www.googleapis.com/auth/documents.readonly']
    def get_secret(self, secret_id: str, version_id: str = "latest") -> dict:
        """Gets a secret from Google Cloud Secret Manager."""
        client = secretmanager.SecretManagerServiceClient()
        name = f"projects/{self.project_id}/secrets/{secret_id}/versions/{version_id}"
        response = client.access_secret_version(request={"name": name})
        data = response.payload.data.decode("UTF-8")
        try:
            return json.loads(data)
        except json.JSONDecodeError:
            return data
    @property
    def service_account_creds(self) -> dict:
        """Lazy-loads the service account credentials from Secret Manager."""
        if not self._service_account:
            secret_id = os.getenv('SERVICE_ACCOUNT_SECRET_ID')
            if not secret_id:
                raise ValueError('SERVICE_ACCOUNT_SECRET_ID env var is not set.')
            self._service_account = self.get_secret(secret_id=secret_id)
        return self._service_account
    @property
    def docs(self):
        """Builds and returns an authenticated Google Docs service resource."""
        if not self._docs:
            creds = service_account.Credentials.from_service_account_info(
                self.service_account_creds, scopes=self.scopes
            )
            self._docs = build('docs', 'v1', credentials=creds)
        return self._docs
GoogleDoc Class
This class takes our authenticated service and a document ID. Its main job is to fetch the document's content and parse the raw JSON into clean, readable text.
class GoogleDoc:
    """Represents a single Google Document and its content."""
    def __init__(self, google_docs_service: GoogleDocsService, file_id_or_uri: str):
        self.google_docs_service = google_docs_service
        self.file_id_or_uri = file_id_or_uri
        self._document = None
    @property
    def id(self) -> str:
        """Extracts the document ID from a full URI or returns the ID itself."""
        if '/d/' in self.file_id_or_uri:
            return self.file_id_or_uri.split('/d/')[1].split('/')[0]
        return self.file_id_or_uri
    @property
    def document(self) -> dict:
        """Fetches the full document resource from the API."""
        if not self._document:
            self._document = self.google_docs_service.docs.documents().get(documentId=self.id).execute()
        return self._document
    @property
    def text(self) -> str:
        """Parses the document content to extract plain text."""
        doc_content = self.document.get('body', {}).get('content', [])
        doc_text = ""
        for value in doc_content:
            if 'paragraph' in value:
                elements = value.get('paragraph', {}).get('elements', [])
                for elem in elements:
                    if 'textRun' in elem:
                        doc_text += elem.get('textRun', {}).get('content', '')
        return doc_text
The text property is where the magic happens. It navigates the nested structure of a Google Doc's JSON representation to find and concatenate all textRun content, effectively stripping out all formatting and giving us the raw text.
Step 3: Adding Text-to-Speech
Now, let's add the final piece to our GoogleDoc class: a method to send the extracted text to the Google Cloud TTS API.
We'll use the SynthesizeLongAudio method, which is perfect for documents as it can handle large amounts of text and conveniently saves the output directly to a Google Cloud Storage bucket.
First, you'll need to create a GCS bucket in your project if you don't have one already.
Let's add the doc_tts method to our GoogleDoc class:
# Add these imports to the top of your file
from uuid import uuid4
from google.cloud import texttospeech
# Add this method inside the GoogleDoc class
def doc_tts(self, gcs_bucket_name: str, project_id: str):
    """Converts the document's text to speech and saves it to GCS."""
    
    blob_name = f'chirp-audio/{self.id}_{uuid4().hex}.wav'
    client = texttospeech.TextToSpeechLongAudioSynthesizeClient()
    # The text we extracted earlier
    synthesis_input = texttospeech.SynthesisInput(text=self.text)
    # Voice selection: Using the new high-definition Chirp voices
    voice = texttospeech.VoiceSelectionParams(
        language_code="en-US",
        name="en-US-Chirp3-HD-Charon" # A great, versatile Chirp voice
    )
    # Audio output configuration
    audio_config = texttospeech.AudioConfig(
        audio_encoding=texttospeech.AudioEncoding.LINEAR16
    )
    # Set the GCS output path
    output_gcs_uri = f'gs://{gcs_bucket_name}/{blob_name}'
    request = texttospeech.SynthesizeLongAudioRequest(
        parent=f"projects/{project_id}/locations/us-central1", # or your preferred location
        input=synthesis_input,
        voice=voice,
        audio_config=audio_config,
        output_gcs_uri=output_gcs_uri
    )
    print("Synthesizing audio... this may take a moment.")
    operation = client.synthesize_long_audio(request=request)
    result = operation.result(timeout=600) # Timeout in seconds
    
    storage_url = f'https://storage.cloud.google.com/{gcs_bucket_name}/{blob_name}'
    print("Synthesis complete!")
    return {
        'status': 'success',
        'detail': f'Audio generated successfully and stored in artifacts.',
        'url': storage_url
    }
A Note on Voices: We're using
en-US-Chirp-HD, which is one of Google's newest and most advanced universal voices. It provides incredible clarity and naturalness. You can explore all available voices, including standard and WaveNet options, in the official Cloud TTS documentation.
Conclusion: Putting It All Together
We now have all the components. Let's create a main block to run our script.
# Add this to the bottom of your google_doc.py file
if __name__ == '__main__':
    # --- CONFIGURATION ---
    # Set these as environment variables for better security
    os.environ['SERVICE_ACCOUNT_SECRET_ID'] = 'doc-to-speech-credentials' # The name of your secret
    GCS_BUCKET = 'your-gcs-bucket-name' # Your GCS bucket name
    
    # Get the Google Doc ID or URL from the user
    doc_id_or_url = input("Enter the Google Doc URL or ID: ")
    try:
        # 1. Initialize the service
        docs_service = GoogleDocsService()
        # 2. Create the GoogleDoc object
        gdoc = GoogleDoc(
            google_docs_service=docs_service, 
            file_id_or_uri=doc_id_or_url
        )
        
        print(f"Reading text from document ID: {gdoc.id}")
        # print(f"Extracted Text: {gdoc.text[:200]}...") # Uncomment to preview text
        # 3. Generate the audio
        result = gdoc.doc_tts(
            gcs_bucket_name=GCS_BUCKET, 
            project_id=docs_service.project_id
        )
        print("\n--- Success! ---")
        print(f"Listen to your document here: {result['url']}")
    except Exception as e:
        print(f"\nAn error occurred: {e}")
To run your script:
- Make sure your GCS bucket is public if you want to share the links, or that you're logged into a Google account with read access to the bucket.
- Set the GOOGLE_APPLICATION_CREDENTIALSenvironment variable if you're running this outside of a Google Cloud environment.
- Run the script: python google_doc.py
- Paste your Google Doc URL when prompted, and watch it go!
You've just built a serverless pipeline to convert written documents into high-quality audiobooks. From here, you could expand this into a web application with Flask, create a Cloud Function that triggers on a new document, or add support for more languages and voices. Happy listening
