Functions and Meanings of Each Option on the Main Interface
As shown in the image above, the function of each option is as follows:
- Select Video: Select the original video to be translated. This video must have human speech, and the sound must be clear and free of excessive noise. Otherwise, the recognition result will not be very accurate. Please note that if there is no speech, it is not possible, regardless of whether there are subtitles or not, because the principle of this software is to recognize human speech to generate subtitles. You can hold down the Ctrl key to select multiple videos at once, but the spoken language in all videos must be the same.
- Translation Channel: FreeGoogle and Microsoft can be used directly without a proxy and configuration. Other translation channels are either free but require a proxy, such as Google, or require configuration, such as Baidu Translate and Tencent Translate. If you don't understand, it is recommended to choose Microsoft or FreeGoogle.
- Original Language: Select the human spoken language in the video. For example, if the human voice in the video is English, then you must select English here.
- Target Language: Select the target language to be translated into. For example, if you want to translate the video to Chinese pronunciation and embed Chinese subtitles, then you should select Chinese Simplified here.
- Network Proxy Address: If you use services that cannot be accessed in China, such as Google or Gemini, you must fill in the proxy address. For example, if you use a certain v2ray software, fill in http://127.0.0.1:10809. If you do not understand the proxy, please do not fill it in arbitrarily, and do not use services that cannot be accessed in China.
- Voiceover Channel: edgeTTS is free and can be used directly without configuration. Other voiceover channels require configuration or installation. If you don't understand, it is recommended to choose edgeTTS.
- Voiceover Role: Select the speaker role. Different roles have different tones. You need to select the target language first and then select the role.
- faster Mode: The mode used to recognize human speech in the video. If you don't understand, choose the default faster mode.
- tiny: The model used to recognize human speech in the video. The default includes the tiny model under the faste mode. It is recommended to choose a medium or larger model for higher accuracy. If you choose the faster mode and openai mode, you need to additionally download the model to the models directory under the software directory. The default only includes the tiny model under the faster mode. Other model download addresses are https://github.com/jianchang512/stt/releases/tag/0.0 If you don't understand much and just want to try it simply, choose tiny here, no need to download, you can use it directly.
- Overall Recognition: Default is fine. No need to touch.
- Embed Subtitles: The way subtitles are embedded in the video. Soft subtitles need player support to be displayed, and cannot be displayed on web pages. Hard subtitles are displayed wherever they are played, and can also be displayed on web pages.
- Video End: The duration of the voiceover may be longer than the original video duration. Select it to extend the video by the last 10ms until the voiceover ends. It is recommended to select it.
- Voiceover Automatic Speed Up: The voiceover duration may be longer than the original language duration. Select it to force the speech speed to be faster to achieve consistency. The maximum speedup range can be modified in Menu--Tools/Advanced Settings--Advanced Settings.
- Video Automatic Slow Down: Select it to slow down the video to align the video with the sound and subtitles. The slowdown range can also be controlled in the advanced settings menu.
- Keep Background Sound: Select it to keep the original background sound in the video, such as background music. If you select it, the processing speed will be slower, especially when the video is large.
- CUDA Acceleration: If you have an N card on Win and Linux machines, you can use it to accelerate. You need to install the CUDA environment on the machine. See https://pyvideotrans.com/gpu.html for the installation tutorial.
- Clean Up Generated: If you repeatedly execute the same video, you can select it to delete the generated ones and regenerate them.
- Shutdown After Completion: Whether to shut down the computer after the task is completed.
- Start Processing: After everything is processed, click Start to execute.
- Import Subtitles: If you want to use local existing subtitles, you can click Import. After importing, it will be used directly, and no recognition will be performed.
- Voiceover Overall Speed: For example, 10 means that the speech speed is increased by 10% on the normal basis, and -10 means that it is reduced by 10%.
- Volume +: Add or subtract volume based on normal volume, only valid under edgeTTS.
- Pitch +: Add or subtract pitch based on normal pitch, only valid under edgeTTS.