Skip to content

You Might Not Know: Gemini 2.5 Adds Free Multi-Speaker Text-to-Speech (TTS)

You might not know, but Google's Gemini 2.5 has added a very practical feature – multi-speaker text-to-speech! It's available for free on Google AI Studio. This functionality is implemented through the gemini-2.5-flash-preview-tts and gemini-2.5-pro-preview-tts models.

Important Notes:

  1. International Internet Access: To access Google AI services, you'll need unrestricted internet access (please figure out how to achieve this yourself). This is fundamental for using international AI tools; otherwise, you won't be able to proceed with subsequent steps.
  2. Google Account: You'll need a free Google account. If you don't have one, you can sign up on the official Google website. Registration can typically be completed with a Chinese mobile number.

I. Accessing the Gemini Speech Generation Page

You can access Gemini's text-to-speech feature page in either of the following ways:

  1. Direct Access: Open the link https://aistudio.google.com/generate-speech in your browser.
  2. Via AI Studio Homepage: If you are already logged into Google AI Studio, you can also find the speech generation feature entrance as shown in the image below. Accessing the speech generation feature via the AI Studio homepage

If the page doesn't open, or if you see a message like "Not supported in your current region" (this often happens when using network nodes in regions like Hong Kong), please try switching your network proxy/VPN to another country or region (e.g., USA, Singapore).

If you see this page, it means your current network node's region is not supported. Please try switching.

Once successfully opened, you will see the speech generation interface as follows:

This is the correct interface for Gemini's free text-to-speech.

II. Interface Overview and Mode Switching

Don't worry, even though the interface is in English, it's very easy to use. We'll explain it step by step below.

Gemini's speech generation tool automatically detects the language of your input text and currently supports up to 24 languages (although Chinese is not officially listed in the documentation, it is actually supported).

By default, you will enter the Multi-speaker audio interface:

The default interface is for multi-speaker audio.

If you only need a single voice for dubbing, you can click Single-speaker audio on the right side of the interface to switch to Single-speaker audio mode. The single-speaker mode interface is simpler:

Click 'Single-speaker audio' on the right to switch to single-speaker dubbing mode.

III. Multi-speaker Dubbing: Practical Steps

We will focus on the more feature-rich multi-speaker dubbing, which currently supports up to 2 speakers.

1. Prepare and Paste the Text for Dubbing

In the Raw structure text box on the left side of the interface, type or paste the text content you want to dub. Key Points:

  • Line Breaks: It's recommended that each line is not too long, ideally ending at natural sentence breaks.
  • Specify Speaker: At the beginning of each line, use the format SpeakerX: (English colon) to specify which character should read that line. For example: Speaker1: It's a beautiful day, sunny and breezy.Speaker2: Yes, it is. How about we go for a walk in the park?
  • Gemini will assign different voices to lines marked with different speakers. It currently supports up to two speakers (e.g., you can define "Speaker1" and "Speaker2").

2. Configure Speaker Roles (Voice settings)

In the Voice settings area on the right side of the interface, you need to configure each speaker:

  • Set Speaker Name (Name): As shown in the image below, the name entered in the Name input box must exactly match the speaker identifier at the beginning of each line in the left-hand text (e.g., "Speaker1", "Speaker2"). Case, numbers, and even spaces must match.

    Ensure the speaker name set here exactly matches the name referenced in the text.

  • Select Voice Actor (Voice): In the Voice dropdown menu below Name, you can select a specific voice actor for the currently selected speaker. Click the play button next to each role to preview their voice and choose the one you like best.

    Click the play button to preview and select a suitable voice actor.

3. (Optional) Set Speaking Style (Style instructions)

If you want the dubbing to have a specific emotion or tone (e.g., happy, angry, sad), you can enter style prompts in the Style instructions text box. After filling them in, these prompts will automatically apply to the entire dubbing project, affecting the overall style of all speakers.

Enter English style prompts here, such as "happy", "excited".

Tip: The text preview area on the right also displays the content from your left editing area in real-time, and you can directly modify, delete, or add lines in this area, which is very convenient.

The right preview area allows direct text editing, synchronized with the left editing area.

4. Generate and Download the Audio

After completing all the above settings, click the blue Run button in the bottom right corner of the interface. Gemini will then start processing your text and generating speech. If everything goes well, after a short wait, an audio player with the generated speech will appear below. You can play it online to preview, and once you are satisfied with the result, click the download button to save it to your computer.

Click Run to start generation; you can play or download the audio upon success.

IV. Potential Issues and Solutions

Currently, Gemini has relatively strict API call rate limits. When processing a large number of text lines, especially in dual-speaker mode, you might encounter generation failures (particularly with Chinese text) and see an error message similar to the one below:

This type of error message is usually related to high request frequency or text processing complexity.

If you encounter this issue, you can try the following methods:

  • Switch to Single-speaker Mode: If multi-speaker is not essential, switching to Single-speaker audio mode usually improves the success rate.
  • Try Again Later: The simplest method is to wait a few minutes or longer and then try again.