Skip to content

The core principle of video translation software is: to recognize the text from the speaking sound in the video, then translate the text into the target language text, then dub the translated text, and finally embed the dubbing and text into the video.

It can be seen that the first step is to recognize the text from the speaking sound in the video, and the accuracy of the recognition directly affects the subsequent translation and dubbing.

openai Mode

This mode is the OpenAI officially open-sourced whisper model, which is slower than faster but has the same accuracy.

image.png

The model selection method on the right is the same. From tiny to large-v3, the consumption of computer resources increases and the accuracy increases.

Note: Although most of the model names are the same in faster mode and openai mode, the models are not universal. Please go to https://pyvideotrans.com/model.html to download the models used for openai mode.

large-v3-turbo Model

openai-whisper has recently released a model optimized from large-v3, large-v3-turbo. The recognition accuracy is similar to the former, while the volume and resource consumption are greatly reduced, and it can be used as a substitute for large-v3.

How to use

  1. Upgrade the software to v2.67 version https://pyvideotrans.com

  2. Select openai-whisper local in the drop-down box after speech recognition

  3. Select large-v3-turbo in the drop-down box after the model

  4. Download the large-v3-turbo.pt file to the models folder in the software directory