Skip to content

The ideal translated video should have accurate subtitles, appropriate length, dubbing voice consistent with the original, and perfect synchronization of subtitles, sound, and visuals.

This guide will detail the four steps of video translation and provide the best configuration recommendations for each step.

Step 1: Speech Recognition

  • Goal: Convert the audio in the video into subtitle files in the corresponding language.

  • Corresponding Control Element: "Speech Recognition" line image.png

  • Best Configuration:

    • Select faster-whisper(local)
    • Model selection: large-v2, large-v3, or large-v3-turbo
    • Speech segmentation mode: Overall recognition
    • Check Voice Denoising (time-consuming)
    • Check Preserve Original Background Sound (time-consuming)
    • If the video is in Chinese, also check Chinese Re-segmentation
  • Note: Processing speed is extremely slow if you don't have an NVIDIA card or CUDA acceleration is not enabled after configuring the CUDA environment. It may crash if the video memory is not large enough.

Step 2: Subtitle Translation

  • Goal: Translate the subtitle file generated in the first step into the target language.

  • Corresponding Control Element: "Translation Channel" line image.png

  • Best Configuration:

    • Priority Choice: If you have a VPN and understand how to configure it, use the gemini-1.5-flash model in Menu - Translation Settings - Gemini pro (Gemini AI Channel).
    • Second Best: If you don't have a VPN or don't know how to configure a proxy, select OpenAI ChatGPT in "Translation Channel", and use the chagpt-4o series model in Menu - Translation Settings - OpenAI ChatGPT (requires a third-party relay).
    • Alternative: If you can't find a suitable third-party relay, you can choose to use domestic AI such as Moonshot AI, Deepseek, etc.
    • In Menu - Tools/Options - Advanced Options, check the two items shown in the following image: image.png

    How to use GeminiAI: https://pyvideotrans.com/gemini.html

Step 3: Dubbing

  • Goal: Generate dubbing based on the translated subtitle file.

  • Corresponding Control Element: "Dubbing Channel" line image.png

  • Best Configuration:

    • Chinese or English: F5-TTS(local), select clone for the dubbing role.
    • Japanese or Korean: CosyVoice(local), select clone for the dubbing role.
    • Other languages: clone-voice(local), select clone for the dubbing role.
    • The above three channels can retain the emotional color of the original video to the greatest extent, with F5-TTS having the best effect.

    You need to additionally install the corresponding F5-TTS/CosyVoice/clone-voice integration package, see the document: https://pyvideotrans.com/f5tts.html

Step 4: Subtitle, Dubbing, and Visual Synchronization

  • Goal: Synchronize the subtitles, dubbing, and visuals.
  • Corresponding Control Element: Synchronization line image.png
  • Best Configuration:
    • When translating from Chinese to English, you can set the Dubbing Speed value (e.g., 10 or 15) to speed up the dubbing, as English sentences are usually longer.
    • Check the three options Video Extension, Dubbing Acceleration, and Video Slowdown to force alignment of subtitles, sound, and visuals.
    • In Menu - Tools/Options - Advanced Options - Subtitle Sound Visual Alignment area, make the following settings: image.png
    • Maximum Audio Acceleration Multiple and Video Slowdown Multiple can be adjusted according to the actual situation (the default value is 3).

    Whether each option is checked and what value is set should be fine-tuned according to the actual speech rate in the video.

Output Video Quality Control

  • The default output is lossy compression. If you need lossless output, in Menu - Tools - Advanced Options - Video Output Control area, set Video Transcoding Loss Control to 0: image.png
  • Note: If the original video is not in MP4 format or uses embedded hard subtitles, video encoding conversion will cause some loss, but the loss is usually negligible. Improving video quality will significantly reduce processing speed and increase the output video size.