Skip to content

How to Dub with Original Video Tone

In dubbing operations, we typically choose a fixed voice tone, such as "yunxi," "xiaoyi," or "解说小帅" (Explanation Little Handsome), and use only that voice tone throughout the entire dubbing process. However, for scenarios with multiple speakers, using a single voice tone may not be ideal. A better effect is to have each speaker correspond to a specific voice tone, preferably consistent with the voice tone of the speaker in the original video. For example, if Pigsy in the original video is speaking, and the translated English dub still maintains Pigsy's voice tone, then the original voice cloning function is needed.

Currently, the software supports three dubbing channels to achieve original voice cloning: clone-voice, CosyVoice, and F5-TTS.

Principle: When dubbing a segment (e.g., 00:00:03 --> 00:00:08), the original audio of that segment is first cut out, and the corresponding original text content and translated target text are obtained. These data are then sent to the dubbing channel, which generates the dubbing of the target text with reference to the voice tone of the original audio.

Using the clone-voice Dubbing Channel

You need to install the https://github.com/jianchang512/clone-voice project. After opening the project homepage, read the instructions carefully. You can deploy the clone-voice project using the source code. If you are using the Windows system, you can also find Releases (https://github.com/jianchang512/clone-voice/releases) in the middle on the right, download the integrated package directly, and double-click app.exe to start it after downloading and unzipping.

When it shows that the startup is successful, fill in the default API address http://127.0.0.1:9988 into the video translation software's Menu--TTS Settings--Original Voice Cloning clone-voice http address. After testing that there are no problems, you can start using it.

image.png

Using the CosyVoice Dubbing Channel

You also need to install the CosyVoice project. For installation instructions, see https://pyvideotrans.com/cosyvoice.html

Of course, you can also use a third-party integrated package, but the third-party integrated package does not support cloning voice tones, and only allows specifying a fixed audio.

After installing according to the tutorial, download the api.py file from this address https://github.com/jianchang512/cosyvoice-api/blob/main/api.py and place it under the CosyVoice project, in the same directory as the webui.py file.

image.png

image.png

Then start api.py and fill in the API address into the video translation software's Menu--TTS Settings-CosyVoice API address, the default address is http://127.0.0.1:9233

image.png

Using the F5-TTS Dubbing Channel

You need to install the F5-TTS project. See https://pyvideotrans.com/f5tts.html for detailed installation instructions.

You can install it from the source code, or use an integrated package for Windows installation. After installation, double-click run-api.bat to start the API service, and then fill in the default address http://127.0.0.1:5010 into the video translation software menu - TTS settings - F5-TTS API address.

image.png

Select clone in the main interface's character selection to perform cloned voice dubbing.

Note that in addition to clone-voice supporting more than a dozen languages, F5-TTS and CosyVoice only support Chinese and English language cloning.

image.png


Multi-Character Dubbing

When only one video is selected for translation at a time, after the subtitle translation is completed and the pause button appears, click pause. In the subtitle area on the right, you can set a dubbing role for each subtitle individually, thereby achieving multi-character dubbing.

In the main interface's dubbing role, you need to select a default dubbing role. If you do not set it separately, all will use the default role.

p1