Skip to content

How to Use Original Voice Cloning for Dubbing

In dubbing operations, we usually choose a fixed voice, such as "yunxi," "xiaoyi," or "解说小帅" (Explanation Master), and use only that voice throughout the entire dubbing process. However, for scenarios with multiple speakers, using a single voice may not be ideal. A better effect is to assign a specific voice to each speaker, preferably consistent with the voice of the speaker in the original video. For example, if Pigsy in the original video is speaking, and after translating it into English, it still retains Pigsy's voice, then you need to use the original voice cloning function.

Currently, the software supports 3 dubbing channels to achieve original voice cloning: clone-voice, CosyVoice, and F5-TTS.

Principle: When dubbing a segment (e.g., 00:00:03 --> 00:00:08), the original audio of that segment is first cut out to obtain the original text content corresponding to the audio and the translated target text. This data is then sent to the dubbing channel, which generates the dubbing of the target text by referencing the voice of the original audio.

Using the clone-voice Dubbing Channel

You need to install the https://github.com/jianchang512/clone-voice project. After opening the project homepage, read the instructions carefully. You can use the source code to deploy the clone-voice project. If you are using Windows, you can also find Releases on the right side of the page (https://github.com/jianchang512/clone-voice/releases), directly download the integrated package, unzip it, and double-click app.exe to start it.

Once it shows that it has started successfully, fill in the default API address http://127.0.0.1:9988 into the video translation software Menu -- TTS Settings -- Original Voice Cloning clone-voice's HTTP address. After testing and confirming that there are no problems, you can start using it.

image.png

Using the CosyVoice Dubbing Channel

You also need to install the CosyVoice project. For installation instructions, see https://pyvideotrans.com/cosyvoice.html

Of course, you can also use a third-party integrated package, but third-party integrated packages do not support voice cloning and can only specify a fixed audio.

After installing according to the tutorial, download the api.py file from this address https://github.com/jianchang512/cosyvoice-api/blob/main/api.py and place it under the CosyVoice project, in the same directory as the webui.py file.

image.png

image.png

Then start api.py and fill in the API address into the video translation software Menu -- TTS Settings -- CosyVoice's API address, the default address is http://127.0.0.1:9233

image.png

Using the F5-TTS Dubbing Channel

You need to install the F5-TTS project. For detailed installation instructions, see https://pyvideotrans.com/f5tts.html

You can install it from the source code, or use the integrated package for installation under Windows. After installation, double-click run-api.bat to start the API service, and then fill in the default address http://127.0.0.1:5010 into the video translation software Menu - TTS Settings - F5-TTS API address.

image.png

Choose clone in the main interface to perform voice cloning dubbing.

Note that, in addition to clone-voice supporting more than a dozen languages, F5-TTS and CosyVoice only support Chinese and English voice cloning.

image.png


Multi-Character Dubbing

When only one video is selected for translation at a time, when the pause button appears after the subtitle translation is completed, click pause. In the subtitle area on the right, you can set a dubbing role for each subtitle separately, so as to achieve multi-character dubbing.

In the dubbing role on the main interface, you need to select a default dubbing role. If you do not set it separately, then all subtitles will use the default role.

p1