How to Dub with Original Video Voice Tone

In dubbing operations, we usually select a fixed voice tone, such as "yunxi," "xiaoyi," or "解说小帅 (Explanation Little Handsome)," and use only that voice tone throughout the entire dubbing process. However, for scenarios with multiple speakers, using a single voice tone may not be ideal. A better effect is to have each speaker correspond to a specific voice tone, preferably consistent with the voice tone of the speaker in the original video. For example, if Bajie in the original video is speaking, and the English translation still maintains Bajie's voice tone, then the original voice cloning function is needed.

Currently, the software supports 3 dubbing channels to achieve original voice cloning: clone-voice, CosyVoice, and F5-TTS.

Principle: When dubbing a segment (e.g., 00:00:03 --> 00:00:08), the original audio of that segment is first cut out to obtain the original text content corresponding to the audio and the translated target text. These data are then transmitted to the dubbing channel, which generates the dubbing of the target text by referring to the voice tone of the original audio.

Using the clone-voice Dubbing Channel

You need to install the https://github.com/jianchang512/clone-voice project. After opening the project homepage, read the instructions carefully. You can deploy the clone-voice project using the source code. If you are using a Windows system, you can also find Releases (https://github.com/jianchang512/clone-voice/releases) in the middle of the right side, directly download the integrated package, download and unzip it, and double-click app.exe to start it.

When the startup is successful, fill in the default API address http://127.0.0.1:9988 in the video translation software's Menu - TTS Settings - Original Voice Cloning clone-voice HTTP address. After testing and confirming that there are no problems, you can start using it.

Using the CosyVoice Dubbing Channel

You also need to install the CosyVoice project. See https://pyvideotrans.com/cosyvoice.html for the installation tutorial.

Of course, you can also use a third-party integrated package, but the third-party integrated package does not support cloned voice tones and can only specify a fixed audio.

After installing according to the tutorial, download the api.py file from this address: https://github.com/jianchang512/cosyvoice-api/blob/main/api.py. Place it under the CosyVoice project, in the same directory as the webui.py file.

Then start api.py and fill in the API address in the video translation software's Menu - TTS Settings - CosyVoice API address, the default address is http://127.0.0.1:9233.

Using the F5-TTS Dubbing Channel

You need to install the F5-TTS project. See https://pyvideotrans.com/f5tts.html for a detailed installation tutorial.

You can install it from the source code, or use an integrated package to install it under Windows. After installation, double-click run-api.bat to start the API service, and then fill in the default address http://127.0.0.1:5010 in the video translation software's Menu - TTS Settings - F5-TTS API address.

Select clone in the main interface role selection to perform cloned voice tone dubbing

Note that in addition to clone-voice supporting more than ten languages, F5-TTS and CosyVoice only support Chinese and English language cloning.

Multi-Role Dubbing

When only one video is selected for translation at a time, when the pause button appears after the subtitle translation is completed, click pause. In the subtitle area on the right, you can set a dubbing role for each subtitle individually, thereby realizing multi-role dubbing.

You need to select a default dubbing role in the dubbing roles in the main interface. If you do not set it separately, all will use the default role.