Skip to content

How to Use the Original Video's Voice for Dubbing

In dubbing operations, we usually choose a fixed voice, such as "yunxi," "xiaoyi," or "解说小帅" (Explanation Xiaoshuai), and use only that voice throughout the dubbing. However, for scenarios with multiple speakers, using a single voice may not be ideal. A better effect would be to have each speaker correspond to a specific voice, preferably consistent with the voice of the speaker in the original video. For example, if Bajie (Pigsy) is speaking in the original video, it would be ideal to maintain Bajie's voice even after translating to English. This is where the original voice cloning feature comes in.

Currently, the software supports three dubbing channels for original voice cloning: clone-voice, CosyVoice, and F5-TTS.

Principle: When dubbing a specific segment (e.g., 00:00:03 --> 00:00:08), the original audio of that segment will be cut out first. The original text content corresponding to the audio and the translated target text will be obtained. Then, this data will be sent to the dubbing channel, which will generate the dubbing of the target text based on the original audio's voice.

Using the clone-voice Dubbing Channel

You need to install the https://github.com/jianchang512/clone-voice project. After opening the project's homepage, carefully read the instructions. You can deploy the clone-voice project using the source code. If you are using Windows, you can also find Releases on the right side of the page (https://github.com/jianchang512/clone-voice/releases), download the integrated package directly, and double-click app.exe to start it after downloading and extracting.

When the program shows that it has started successfully, fill in the default API address http://127.0.0.1:9988 in the video translation software under Menu--TTS Settings--Original Voice Cloning clone-voice in the HTTP address field. After testing that there are no problems, you can start using it.

image.png

Using the CosyVoice Dubbing Channel

Similarly, you need to install the CosyVoice project. See https://pyvideotrans.com/cosyvoice.html for the installation tutorial.

Of course, you can also use third-party integrated packages, but third-party integrated packages do not support cloning voices; they only allow specifying fixed audio.

After installing according to the tutorial, download the api.py file from this address: https://github.com/jianchang512/cosyvoice-api/blob/main/api.py and place it in the CosyVoice project, in the same directory as the webui.py file.

image.png

image.png

Then, start api.py and fill in the API address in the video translation software under Menu--TTS Settings-CosyVoice in the API address field. The default address is http://127.0.0.1:9233.

image.png

Using the F5-TTS Dubbing Channel

You need to install the F5-TTS project. See https://pyvideotrans.com/f5tts.html for a detailed installation tutorial.

You can install from the source code, or use the integrated package installation under Windows. After installation, double-click run-api.bat to start the API service, and then fill in the default address http://127.0.0.1:5010 in the video translation software under Menu-TTS Settings-F5-TTS API address.

image.png

In the main interface, select the role as clone to perform cloned voice dubbing

Note that, except for clone-voice, which supports more than ten languages, F5-TTS and CosyVoice only support Chinese and English voice cloning.

image.png