You Might Not Know: Gemini 2.5 Adds Free Multi-Speaker Text-to-Speech (TTS)
You might not know, but Google's Gemini 2.5 has added a very practical feature – multi-speaker text-to-speech! It's available for free on Google AI Studio. This functionality is implemented through the gemini-2.5-flash-preview-tts
and gemini-2.5-pro-preview-tts
models.
Important Notes:
- International Internet Access: To access Google AI services, you'll need unrestricted internet access (please figure out how to achieve this yourself). This is fundamental for using international AI tools; otherwise, you won't be able to proceed with subsequent steps.
- Google Account: You'll need a free Google account. If you don't have one, you can sign up on the official Google website. Registration can typically be completed with a Chinese mobile number.
I. Accessing the Gemini Speech Generation Page
You can access Gemini's text-to-speech feature page in either of the following ways:
- Direct Access: Open the link https://aistudio.google.com/generate-speech in your browser.
- Via AI Studio Homepage: If you are already logged into Google AI Studio, you can also find the speech generation feature entrance as shown in the image below.
If the page doesn't open, or if you see a message like "Not supported in your current region" (this often happens when using network nodes in regions like Hong Kong), please try switching your network proxy/VPN to another country or region (e.g., USA, Singapore).
Once successfully opened, you will see the speech generation interface as follows:
II. Interface Overview and Mode Switching
Don't worry, even though the interface is in English, it's very easy to use. We'll explain it step by step below.
Gemini's speech generation tool automatically detects the language of your input text and currently supports up to 24 languages (although Chinese is not officially listed in the documentation, it is actually supported).
By default, you will enter the Multi-speaker audio interface:
If you only need a single voice for dubbing, you can click Single-speaker audio
on the right side of the interface to switch to Single-speaker audio mode. The single-speaker mode interface is simpler:
III. Multi-speaker Dubbing: Practical Steps
We will focus on the more feature-rich multi-speaker dubbing, which currently supports up to 2 speakers.
1. Prepare and Paste the Text for Dubbing
In the Raw structure text box on the left side of the interface, type or paste the text content you want to dub. Key Points:
- Line Breaks: It's recommended that each line is not too long, ideally ending at natural sentence breaks.
- Specify Speaker: At the beginning of each line, use the format
SpeakerX:
(English colon) to specify which character should read that line. For example:Speaker1: It's a beautiful day, sunny and breezy.
Speaker2: Yes, it is. How about we go for a walk in the park?
- Gemini will assign different voices to lines marked with different speakers. It currently supports up to two speakers (e.g., you can define "Speaker1" and "Speaker2").
2. Configure Speaker Roles (Voice settings)
In the Voice settings area on the right side of the interface, you need to configure each speaker:
Set Speaker Name (Name): As shown in the image below, the name entered in the Name input box must exactly match the speaker identifier at the beginning of each line in the left-hand text (e.g., "Speaker1", "Speaker2"). Case, numbers, and even spaces must match.
Select Voice Actor (Voice): In the Voice dropdown menu below Name, you can select a specific voice actor for the currently selected speaker. Click the play button next to each role to preview their voice and choose the one you like best.
3. (Optional) Set Speaking Style (Style instructions)
If you want the dubbing to have a specific emotion or tone (e.g., happy, angry, sad), you can enter style prompts in the Style instructions
text box. After filling them in, these prompts will automatically apply to the entire dubbing project, affecting the overall style of all speakers.
Tip: The text preview area on the right also displays the content from your left editing area in real-time, and you can directly modify, delete, or add lines in this area, which is very convenient.
4. Generate and Download the Audio
After completing all the above settings, click the blue Run button in the bottom right corner of the interface. Gemini will then start processing your text and generating speech. If everything goes well, after a short wait, an audio player with the generated speech will appear below. You can play it online to preview, and once you are satisfied with the result, click the download button to save it to your computer.
IV. Potential Issues and Solutions
Currently, Gemini has relatively strict API call rate limits. When processing a large number of text lines, especially in dual-speaker mode, you might encounter generation failures (particularly with Chinese text) and see an error message similar to the one below:
If you encounter this issue, you can try the following methods:
- Switch to Single-speaker Mode: If multi-speaker is not essential, switching to
Single-speaker audio
mode usually improves the success rate. - Try Again Later: The simplest method is to wait a few minutes or longer and then try again.