The core principle of video translation software is: to recognize the text from the speaking sounds in the video, then translate the text into the target language text, then dub the translated text, and finally embed the dubbing and text into the video.
It can be seen that the first step is to recognize the text from the speaking sounds in the video, and the recognition accuracy directly affects the subsequent translation and dubbing.
faster Mode
Recommended to use, this is a model based on OpenAI's open source whisper conversion, as the name implies, the recognition speed is faster without reducing accuracy.
After selecting faster mode
, you can select the model to use on the right. The default built-in tiny
model is the smallest model, and the effect is the least accurate.
tiny--base--small--medium--large
The model size is getting larger and larger, and the recognition accuracy is also getting higher and higher.
For Chinese videos, it is recommended to select at least the medium
model. The model download address is https://pyvideotrans.com/model
The models with the .en
suffix and the models starting with distil
can only be used for English videos.
There is also a Whole Recognition
drop-down box on the right side of the model. The drop-down will display Equal Segmentation
. Generally, select Whole Recognition
for no special needs. If you need to divide the audio into parts of equal duration, such as wanting each subtitle to be 10s long, then you can select Equal Segmentation. And set the fragment duration in seconds in the VAD parameter section in Menu--Tools/Advanced Settings--Advanced Settings.
In order to speed up the task, on Windows and Linux, if there is an Nvidia graphics card, you can configure and install the CUDA and cuDNN environment, and then enable CUDA acceleration
, which will significantly improve the execution speed.
CUDA and cuDNN installation tutorial: https://pyvideotrans.com/gpu.html
Automatic Language Detection
After version v2.59, the "Original Language" drop-down box has a new "Automatic Detection" option. When you don't know what language it is or the language does not belong to the 24 languages supported, you can select the "Automatic Detection" option, and the program will try to automatically recognize the speaking language.
Of course, if possible, avoid using this option as much as possible, especially when there is no clear speaking sound in the first 30 seconds of the video, because the automatic detection principle is to use the first 30 seconds of audio clips to determine, so as to set the language used for the entire video. Another point to note: some languages with similar pronunciations but different writing methods cannot be accurately recognized and may be recognized as any one. For example, Chinese videos may be randomly recognized as Simplified or Traditional.