Skip to content

How to Use the Original Voice Tone for Dubbing

In dubbing operations, we typically select a fixed voice tone, such as "yunxi," "xiaoyi," or "narrator Xiaoshuai," and use only that tone throughout the entire dubbing process. However, for scenarios with multiple speakers, using a single voice tone may not be ideal. A better approach is to assign a specific voice tone to each speaker, ideally matching the original speaker's voice tone from the video. For example, if Bajie is speaking in the original video, the English translation should still retain Bajie's voice tone, which requires the use of the original voice cloning feature.

Currently, the software supports three dubbing channels for original voice cloning: clone-voice, CosyVoice, and F5-TTS.

Principle: When dubbing a specific segment (e.g., 00:00:03 --> 00:00:08), the original audio of that segment is first extracted to obtain the corresponding original text and the translated target text. This data is then sent to the dubbing channel, which generates the dubbing for the target text by referencing the voice tone of the original audio.

Using the clone-voice Dubbing Channel

You need to install the https://github.com/jianchang512/clone-voice project. After opening the project homepage, carefully read the instructions. You can deploy the clone-voice project using the source code. For Windows systems, you can also find Releases (https://github.com/jianchang512/clone-voice/releases) on the right side of the page and download the integrated package directly. After downloading and extracting, double-click app.exe to start it.

Once it shows a successful startup, enter the default API address http://127.0.0.1:9988 into the video translation software under Menu > TTS Settings > Original Voice Cloning clone-voice in the HTTP address field. After testing without issues, you can start using it.

image.png

Using the CosyVoice Dubbing Channel

Similarly, you need to install the CosyVoice project. For installation instructions, see https://pyvideotrans.com/cosyvoice.html.

Alternatively, you can use third-party integrated packages, but these do not support voice cloning and only allow specifying a fixed voice tone.

After installation, go to https://github.com/jianchang512/cosyvoice-api/blob/main/api.py to download the api.py file and place it in the CosyVoice project directory, in the same location as the webui.py file.

image.png

image.png

Then start api.py and enter the API address into the video translation software under Menu > TTS Settings > CosyVoice in the API address field. The default address is http://127.0.0.1:9233.

image.png

Using the F5-TTS Dubbing Channel

You need to install the F5-TTS project. For detailed installation instructions, see https://pyvideotrans.com/f5tts.html.

You can install it using the source code or, for Windows, use the integrated package. After installation, double-click run-api.bat to start the API service. Then enter the default address http://127.0.0.1:5010 into the video translation software under Menu > TTS Settings > F5-TTS API Address.

image.png

Select "clone" in the Main Interface to Enable Voice Cloning Dubbing

Note: While clone-voice supports over ten languages, F5-TTS and CosyVoice only support Chinese and English voice cloning.

image.png


Multi-Role Dubbing

When translating only one video at a time, after the subtitle translation is complete and the pause button appears, click pause. In the subtitle area on the right, you can individually set a dubbing role for each subtitle to achieve multi-role dubbing.

In the main interface, you need to select a default dubbing role. If no individual settings are made, all subtitles will use this default role.

p1