How to Use Original Voice Cloning for Dubbing
In dubbing operations, we usually choose a fixed voice, such as "yunxi," "xiaoyi," or "解说小帅" (Explanation Master), and use only that voice throughout the entire dubbing process. However, for scenarios with multiple speakers, using a single voice may not be ideal. A better effect is to assign a specific voice to each speaker, preferably consistent with the voice of the speaker in the original video. For example, if Pigsy in the original video is speaking, and after translating it into English, it still retains Pigsy's voice, then you need to use the original voice cloning function.
Currently, the software supports 3 dubbing channels to achieve original voice cloning: clone-voice, CosyVoice, and F5-TTS.
Principle: When dubbing a segment (e.g., 00:00:03 --> 00:00:08), the original audio of that segment is first cut out to obtain the original text content corresponding to the audio and the translated target text. This data is then sent to the dubbing channel, which generates the dubbing of the target text by referencing the voice of the original audio.
Using the clone-voice Dubbing Channel
You need to install the https://github.com/jianchang512/clone-voice project. After opening the project homepage, read the instructions carefully. You can use the source code to deploy the clone-voice project. If you are using Windows, you can also find Releases on the right side of the page (https://github.com/jianchang512/clone-voice/releases), directly download the integrated package, unzip it, and double-click app.exe
to start it.
Once it shows that it has started successfully, fill in the default API address http://127.0.0.1:9988
into the video translation software Menu -- TTS Settings -- Original Voice Cloning clone-voice's HTTP address. After testing and confirming that there are no problems, you can start using it.
Using the CosyVoice Dubbing Channel
You also need to install the CosyVoice project. For installation instructions, see https://pyvideotrans.com/cosyvoice.html
Of course, you can also use a third-party integrated package, but third-party integrated packages do not support voice cloning and can only specify a fixed audio.
After installing according to the tutorial, download the api.py
file from this address https://github.com/jianchang512/cosyvoice-api/blob/main/api.py and place it under the CosyVoice project, in the same directory as the webui.py file.
Then start api.py and fill in the API address into the video translation software Menu -- TTS Settings -- CosyVoice's API address, the default address is http://127.0.0.1:9233
Using the F5-TTS Dubbing Channel
You need to install the F5-TTS project. For detailed installation instructions, see https://pyvideotrans.com/f5tts.html
You can install it from the source code, or use the integrated package for installation under Windows. After installation, double-click run-api.bat to start the API service, and then fill in the default address http://127.0.0.1:5010
into the video translation software Menu - TTS Settings - F5-TTS API address.
Choose clone in the main interface to perform voice cloning dubbing.
Note that, in addition to clone-voice supporting more than a dozen languages, F5-TTS and CosyVoice only support Chinese and English voice cloning.
Multi-Character Dubbing
When only one video is selected for translation at a time, when the pause button appears after the subtitle translation is completed, click pause. In the subtitle area on the right, you can set a dubbing role for each subtitle separately, so as to achieve multi-character dubbing.
In the dubbing role on the main interface, you need to select a default dubbing role. If you do not set it separately, then all subtitles will use the default role.