Skip to content

Two Key Factors Determining Quality:

  1. The accuracy of the recognized text.

  2. The quality of the translated text.

The accuracy of the text directly determines the quality of the translation. Therefore, improving translation quality requires addressing both of these aspects.

I: Improving Text Recognition Accuracy:

  1. Use the large-v3 model.

    From the base model, small model, medium model to the large-v3 model, the recognition accuracy improves progressively. However, the consumption of computer resources also increases. If your computer has a high-performance NVIDIA graphics card with at least 8GB of VRAM, and you have configured the CUDA and cuDNN environment, you can try using the large-v3 model, which will significantly improve the accuracy of text and subtitle recognition.

[View CUDA and cuDNN environment installation method](https://juejin.cn/post/7318704408727519270)

2. Separate the background sound from the video.

If there is a lot of background music or noise in the video, it will definitely interfere with the text recognition effect. You can try selecting "Keep Background Sound", which will separate the background sound before recognition and only use the human speech inside for recognition. The effect will be much better.

Of course, you can also use other third-party separation tools or the "Separate Voice Background" function on the left side of the software to separate the human voice and background sound in the video.

Then use the "Audio and Video to Text" function to separately recognize the human voice into subtitles and obtain the text subtitles.

Then, under "Text Subtitle Translation", translate the subtitles into the target language.

Then, under "Standard Function Mode", import the subtitles, add background music, and finally embed the dubbing and subtitles into the video. Although the steps are slightly cumbersome, it can significantly improve the translation effect.

3. Manually Modify and Adjust

After the subtitle recognition is completed and after the translation is completed, the current complete text will be displayed in the subtitle area on the right side of the software. You can click the "Pause" button to pause and then manually modify and adjust it. No matter how accurate machine recognition and translation are, they will never be as good as manual proofreading.

II: Improving the Quality of Text Subtitle Translation

Among them, the best translation quality is ChatGPT/DeepL/Azure. These three require paid accounts, but none of them support domestic users to pay, and ChatGPT/Azure also need to configure a proxy, which has a high threshold.

If you meet these conditions, have a paid account and know how to configure a proxy, you can use these three translation channels to improve translation quality (there are many transit proxy services available for ChatGPT in China).

The next best options are Google/Gemini/Microsoft, which are all free. Google and Gemini require a proxy configuration, while Microsoft does not.

However, note that Gemini has higher security restrictions. If your video dialogue content is rated, Gemini may refuse to translate it.

Next, you can choose Baidu Translate and Tencent Translate, and you need to apply for free keys and appids from their respective websites. Tencent has a higher free quota, while Baidu has a very low free quota.

In summary, if the conditions are met, ChatGPT/DeepL should be preferred, followed by Google, then Microsoft, and finally Tencent Translate and Baidu Translate.

Of course, you can also use DeepLx to use DeepL for free, but it is unstable and easily blocked.

Similarly, after the translation is completed, a pause button will appear. Click pause, and the translation results in the subtitle area on the right can be manually checked and modified.