Skip to content

Gemini 2.5 Introduces Multi-Speaker Text-to-Speech (TTS), Available for Free

Google's Gemini 2.5 has a new and incredibly useful feature: multi-speaker text-to-speech! You can use it for free on Google AI Studio. This capability is driven by the gemini-2.5-flash-preview-tts and gemini-2.5-pro-preview-tts models.

Important Notes:

  1. Internet Access (with VPN if needed): To access Google AI services, you need to be able to access the global internet. You'll need to handle any necessary network configurations yourself. This is fundamental to using foreign AI tools.
  2. Google Account: You need a free Google account. If you don't have one yet, you can register on the Google website. Usually, a local phone number is sufficient for registration.

I. Accessing the Gemini Text-to-Speech Webpage

You can access Gemini's text-to-speech feature through either of the following ways:

  1. Direct Access: Open the link https://aistudio.google.com/generate-speech in your browser.
  2. Via the AI Studio Homepage: If you are already logged into Google AI Studio, you can also find the voice generation feature entry by following the instructions in the image below. Accessing the voice generation feature via the AI Studio homepage

If the page fails to open, or if you see a message like "Not supported in your current region" (which often happens when using a Hong Kong network node), try switching your network proxy node to another country or region (such as the United States or Singapore).

If you see this page, it means the region of your current network node is not supported. Try switching.

Once successfully opened, you will see the following voice generation interface:

This is the correct interface for Gemini's free text-to-speech.

II. Interface Overview and Mode Switching

Don't worry, although the interface is in English, it's very easy to use. We will explain it step by step below.

Gemini's voice generation tool automatically detects the language of the text you enter. It currently supports up to 24 languages (although Chinese support is not explicitly stated in the documentation, it is actually supported).

By default, you will enter the Multi-speaker audio interface:

The default interface is the multi-speaker audio interface.

If you only need a single voice for the speech, you can click Single-speaker audio on the right side of the interface to switch to Single-speaker audio mode. The single-person mode interface is simpler:

Click  on the right to switch to single-speaker audio mode.

III. Multi-Speaker Text-to-Speech Practical Steps

We will focus on the more feature-rich multi-speaker text-to-speech, which currently supports only 2 speakers.

1. Prepare and Paste Text for Speech

In the Raw structure text box on the left side of the interface, enter or paste the text content you want to be voiced. Key points:

  • Line Breaks: It is recommended that each line of content is not too long, with natural pauses in the sentences.
  • Specify Speaker: At the beginning of each line, use the format SpeakerX: (English colon) to specify which character will read that line. For example: Speaker1: Today is a beautiful day, the sun is shining and the breeze is gentle.Speaker2: Yes, how about we go for a walk in the park?
  • Gemini will assign different voices to lines marked with different speakers. Currently, up to two speakers are supported (for example, you can define "Speaker1" and "Speaker2").

2. Configure Speaker Roles (Voice settings)

In the Voice settings area on the right side of the interface, you need to configure each speaker:

  • Set Speaker Name (Name): As shown in the figure below, the name entered in the Name input box must be exactly the same as the speaker identifier at the beginning of each line in the text on the left (such as "Speaker1", "Speaker2"). Case, numbers, and even spaces must match.

    Make sure the speaker name set here matches the name referenced in the text exactly.

  • Select Voice: In the Voice drop-down menu below Name, you can select a specific voice actor role for the currently selected speaker. Click the play button next to each role to audition its timbre and choose your favorite voice.

    Click the play button to audition and select a suitable voice actor role.

3. (Optional) Set Voice Style (Style instructions)

If you want the voiceover to have a specific emotion or tone (e.g., happy, angry, sad), you can enter style prompts in the Style instructions text box. These prompts will be automatically applied to the entire voiceover project, affecting the overall style of all speakers.

Enter English style prompts here, such as "happy", "excited".

Tip: The text preview area on the right side also displays the content of your left editing area in real time, and you can directly modify, delete, or add lines in this area, which is very convenient.

The preview area on the right can directly edit the text, synchronized with the editing area on the left.

4. Generate and Download Voiceover

After completing all the above settings, click the blue Run button in the lower right corner of the interface. Gemini will start processing your text and generating speech. If everything goes well, after a short while, the generated audio player will appear below. You can directly play it online to listen to it. After confirming that you are satisfied with the effect, click the download button to save it to your computer.

Click Run to start generating, and you can play or download the audio after success.

IV. Possible Problems and Solutions

Currently, Gemini has relatively strict limits on the API call frequency. When you process a large number of text lines, especially when using dual-speaker mode, you may encounter generation failures (especially when the text is in Chinese) and see an error message similar to the one below:

This error message is usually related to high request frequency or text processing complexity.

If you encounter this problem, you can try the following methods:

  • Switch to Single-Speaker Mode: If multi-speaker is not essential, switching to Single-speaker audio mode usually improves the success rate.
  • Try Again Later: The simplest method is to wait a few minutes or longer before trying again.