How to Dub with Original Video Tone
In dubbing operations, we typically choose a fixed voice tone, such as "yunxi," "xiaoyi," or "解说小帅" (Explanation Little Handsome), and use only that voice tone throughout the entire dubbing process. However, for scenarios with multiple speakers, using a single voice tone may not be ideal. A better effect is to have each speaker correspond to a specific voice tone, preferably consistent with the voice tone of the speaker in the original video. For example, if Pigsy in the original video is speaking, and the translated English dub still maintains Pigsy's voice tone, then the original voice cloning function is needed.
Currently, the software supports three dubbing channels to achieve original voice cloning: clone-voice, CosyVoice, and F5-TTS.
Principle: When dubbing a segment (e.g., 00:00:03 --> 00:00:08), the original audio of that segment is first cut out, and the corresponding original text content and translated target text are obtained. These data are then sent to the dubbing channel, which generates the dubbing of the target text with reference to the voice tone of the original audio.
Using the clone-voice Dubbing Channel
You need to install the https://github.com/jianchang512/clone-voice project. After opening the project homepage, read the instructions carefully. You can deploy the clone-voice project using the source code. If you are using the Windows system, you can also find Releases (https://github.com/jianchang512/clone-voice/releases) in the middle on the right, download the integrated package directly, and double-click app.exe
to start it after downloading and unzipping.
When it shows that the startup is successful, fill in the default API address http://127.0.0.1:9988
into the video translation software's Menu--TTS Settings--Original Voice Cloning clone-voice http address. After testing that there are no problems, you can start using it.
Using the CosyVoice Dubbing Channel
You also need to install the CosyVoice project. For installation instructions, see https://pyvideotrans.com/cosyvoice.html
Of course, you can also use a third-party integrated package, but the third-party integrated package does not support cloning voice tones, and only allows specifying a fixed audio.
After installing according to the tutorial, download the api.py
file from this address https://github.com/jianchang512/cosyvoice-api/blob/main/api.py and place it under the CosyVoice project, in the same directory as the webui.py file.
Then start api.py and fill in the API address into the video translation software's Menu--TTS Settings-CosyVoice API address, the default address is http://127.0.0.1:9233
Using the F5-TTS Dubbing Channel
You need to install the F5-TTS project. See https://pyvideotrans.com/f5tts.html for detailed installation instructions.
You can install it from the source code, or use an integrated package for Windows installation. After installation, double-click run-api.bat to start the API service, and then fill in the default address http://127.0.0.1:5010
into the video translation software menu - TTS settings - F5-TTS API address.
Select clone in the main interface's character selection to perform cloned voice dubbing.
Note that in addition to clone-voice supporting more than a dozen languages, F5-TTS and CosyVoice only support Chinese and English language cloning.
Multi-Character Dubbing
When only one video is selected for translation at a time, after the subtitle translation is completed and the pause button appears, click pause. In the subtitle area on the right, you can set a dubbing role for each subtitle individually, thereby achieving multi-character dubbing.
In the main interface's dubbing role, you need to select a default dubbing role. If you do not set it separately, all will use the default role.