The ideal translated video should have accurate subtitles, appropriate length, dubbing voice consistent with the original, and perfect synchronization of subtitles, sound, and visuals.
This guide will detail the four steps of video translation and provide the best configuration recommendations for each step.
Step 1: Speech Recognition
Goal: Convert the audio in the video into subtitle files in the corresponding language.
Corresponding Control Element: "Speech Recognition" line
Best Configuration:
- Select
faster-whisper(local)
- Model selection:
large-v2
,large-v3
, orlarge-v3-turbo
- Speech segmentation mode:
Overall recognition
- Check
Voice Denoising
(time-consuming) - Check
Preserve Original Background Sound
(time-consuming) - If the video is in Chinese, also check
Chinese Re-segmentation
- Select
Note: Processing speed is extremely slow if you don't have an NVIDIA card or CUDA acceleration is not enabled after configuring the CUDA environment. It may crash if the video memory is not large enough.
Step 2: Subtitle Translation
Goal: Translate the subtitle file generated in the first step into the target language.
Corresponding Control Element: "Translation Channel" line
Best Configuration:
- Priority Choice: If you have a VPN and understand how to configure it, use the
gemini-1.5-flash
model in Menu - Translation Settings - Gemini pro (Gemini AI Channel). - Second Best: If you don't have a VPN or don't know how to configure a proxy, select
OpenAI ChatGPT
in "Translation Channel", and use thechagpt-4o
series model in Menu - Translation Settings - OpenAI ChatGPT (requires a third-party relay). - Alternative: If you can't find a suitable third-party relay, you can choose to use domestic AI such as Moonshot AI, Deepseek, etc.
- In Menu - Tools/Options - Advanced Options, check the two items shown in the following image:
How to use GeminiAI: https://pyvideotrans.com/gemini.html
- Priority Choice: If you have a VPN and understand how to configure it, use the
Step 3: Dubbing
Goal: Generate dubbing based on the translated subtitle file.
Corresponding Control Element: "Dubbing Channel" line
Best Configuration:
- Chinese or English:
F5-TTS(local)
, selectclone
for the dubbing role. - Japanese or Korean:
CosyVoice(local)
, selectclone
for the dubbing role. - Other languages:
clone-voice(local)
, selectclone
for the dubbing role. - The above three channels can retain the emotional color of the original video to the greatest extent, with
F5-TTS
having the best effect.
You need to additionally install the corresponding
F5-TTS/CosyVoice/clone-voice
integration package, see the document: https://pyvideotrans.com/f5tts.html- Chinese or English:
Step 4: Subtitle, Dubbing, and Visual Synchronization
- Goal: Synchronize the subtitles, dubbing, and visuals.
- Corresponding Control Element:
Synchronization
line - Best Configuration:
- When translating from Chinese to English, you can set the
Dubbing Speed
value (e.g.,10
or15
) to speed up the dubbing, as English sentences are usually longer. - Check the three options
Video Extension
,Dubbing Acceleration
, andVideo Slowdown
to force alignment of subtitles, sound, and visuals. - In Menu - Tools/Options - Advanced Options - Subtitle Sound Visual Alignment area, make the following settings:
Maximum Audio Acceleration Multiple
andVideo Slowdown Multiple
can be adjusted according to the actual situation (the default value is 3).
Whether each option is checked and what value is set should be fine-tuned according to the actual speech rate in the video.
- When translating from Chinese to English, you can set the
Output Video Quality Control
- The default output is lossy compression. If you need lossless output, in Menu - Tools - Advanced Options - Video Output Control area, set
Video Transcoding Loss Control
to 0: - Note: If the original video is not in MP4 format or uses embedded hard subtitles, video encoding conversion will cause some loss, but the loss is usually negligible. Improving video quality will significantly reduce processing speed and increase the output video size.