Skip to content

Function and Meaning of Each Option in the Main Interface

As shown in the figure above, the function of each option is as follows:

  1. Select Video: Select the original video to be translated. The video must have human speech and the sound must be clear without excessive noise, otherwise the recognition result will not be accurate. Please note that it is not possible if there is no speech, regardless of whether there are subtitles or not, because the principle of this software is to recognize human speech to generate subtitles. You can hold down the Ctrl key to select multiple videos at once, but the spoken language must be the same in all videos.
  2. Translation Channel: FreeGoogle and Microsoft can be used directly without proxy and configuration. Other translation channels are either free but require a proxy, such as Google, or require configuration, such as Baidu Translate and Tencent Translate. If you don't understand, it is recommended to choose Microsoft or FreeGoogle.
  3. Original Language: Select the human spoken language in the video. For example, if the human speech in the video is English, then you must select English here.
  4. Target Language: Select the target language to be translated into. For example, if you want to translate the video into Chinese pronunciation and embed Chinese subtitles, then you should select Chinese Simplified here.
  5. Network Proxy Address: If you are using services that cannot be accessed in China, such as Google or Gemini, you must fill in the proxy address. For example, if you are using a v2ray software, fill in http://127.0.0.12:1:10809. If you don't understand proxies, please don't fill it in casually, and don't use services that cannot be accessed in China.
  6. Voiceover Channel: edgeTTS is free and can be used directly without configuration. Other voiceover channels require configuration or installation. If you don't understand, it is recommended to choose edgeTTS.
  7. Voiceover Role: Select the voice actor role. Different roles have different timbres. You need to select the target language first and then select the role.
  8. Faster Mode: The mode used to recognize human speech in the video. If you don't understand, just choose the default faster mode.
  9. Tiny: The model used to recognize human speech in the video. The default includes the tiny model in faster mode. It is recommended to choose a medium or larger model for higher accuracy. If you choose faster mode and openai mode, you need to additionally download the model to the models folder under the software directory. The default only includes the tiny model in faster mode. Other model download addresses: https://github.com/jianchang512/stt/releases/tag/0.0 If you don't understand very well and just want to try it simply, choose tiny here, you can use it directly without downloading.
  10. Overall Recognition: Just the default. No need to move.
  11. Embed Subtitles: The way subtitles are embedded into the video. Soft subtitles require player support to be displayed, and cannot be displayed in web pages. Hard subtitles are displayed no matter where they are played, and can also be displayed in web pages.
  12. End of Video: The duration after voiceover may be greater than the original video duration. If you select it, the video will be extended by 10ms until the voiceover ends. It is recommended to select it.
  13. Voiceover Automatic Acceleration: The voiceover duration may be greater than the original language duration. If you select it, the speech speed will be forcibly increased to achieve consistency. The maximum acceleration range can be modified in Menu -- Tools/Advanced Settings -- Advanced Settings.
  14. Video Automatic Slow Motion: Select it to slow down the video to align the video with the sound and subtitles. The slow motion range can also be controlled in the advanced settings menu.
  15. Keep Background Sound: Select to keep the original background sound in the video, such as background music, etc. If you select it, the processing speed will be slower, especially when the video is large.
  16. CUDA Acceleration: If there is an NVIDIA graphics card on Win and Linux machines, you can use it to accelerate. You need to install the CUDA environment on the machine. See https://pyvideotrans.com/gpu.html for the installation tutorial.
  17. Clean up generated: If you repeatedly execute the same video, you can select it to delete the generated ones and regenerate them.
  18. Shutdown after completion: Whether to shut down the computer after the task is completed.
  19. Start processing: After everything is processed, click Start to execute.
  20. Import Subtitles: If you want to use local existing subtitles, you can click Import. After importing, it will be used directly, and recognition will no longer be performed.
  21. Overall voiceover speed: For example, 10 means the speed is increased by 10% based on the normal speed, and -10 means it is reduced by 10%.
  22. Volume +: Add or subtract the volume based on the normal volume, only valid under edgeTTS.
  23. Pitch +: Add or subtract the pitch based on the normal pitch, only valid under edgeTTS.