Two Key Factors Determining Quality:
Firstly, the accuracy of the recognized text.
Secondly, the quality of the translated text.
The accuracy of the recognized text directly determines the quality of the translation. Therefore, to improve translation quality, you need to focus on both these aspects.
I: Improving Text Recognition Accuracy:
Use the large-v3 model.
From base, small, and medium models to the large-v3 model, recognition accuracy improves progressively, though it consumes more computational resources. If your computer has a high-performance NVIDIA graphics card with at least 8GB of video memory, and you have configured CUDA and cuDNN environments, you can try using the large-v3 model. This will significantly enhance the accuracy of text and subtitle recognition.
Separate background sound in videos.
If a video contains a lot of background music or noise, it will inevitably interfere with text recognition. You can try selecting "Keep background audio." This option will separate the background sound before recognition, using only the human voice for transcription, which significantly improves results.
Alternatively, you can use other third-party separation tools or the "Separate Vocals & Background" feature on the left side of the software to isolate human voices and background audio from the video.
Then, use the "Audio/Video to Text" function to transcribe the isolated human voice into subtitles.
After that, under "Text Subtitle Translation," translate these subtitles into the target language.
Finally, switch to "Standard Function Mode," import the translated subtitles, add background music, and embed the voiceover and subtitles into the video. While the steps are slightly more complex, this method can significantly enhance translation quality.
Manual modification and adjustment.
After both subtitle recognition and translation are complete, the full text will be displayed in the subtitle area on the right side of the software. You can click the "Pause" button to stop and manually modify or adjust the text. Regardless of how accurate machine recognition and translation are, they can never fully replace human proofreading.
II: Improving Text Subtitle Translation Quality
The best translation quality comes from ChatGPT/DeepL/Azure. All three require paid accounts and generally do not support direct payments from users in certain regions. Additionally, ChatGPT/Azure often requires proxy configuration, which can be a barrier.
If you meet these conditions, possess a paid account, and can configure a proxy, using these three translation channels will improve your translation quality (many intermediary proxy services are available for ChatGPT).
Next in effectiveness are Google/Gemini/Microsoft. These three are free. Google and Gemini require proxy configuration, while Microsoft does not.
However, be aware that Gemini has higher security restrictions. If your video's dialogue content is categorized (e.g., adult, sensitive), Gemini might refuse to translate it.
Further options include Baidu Translate and Tencent Translate. You'll need to apply for free keys and app IDs on their respective websites. Tencent offers a higher free quota, whereas Baidu's free quota is very low.
In summary, if conditions are met, ChatGPT/DeepL are the top choices, followed by Google, then Microsoft, and finally Tencent Translate or Baidu Translate.
You can also use DeepLx to potentially access DeepL for free, but it's unstable and prone to IP blocking.
Similarly, after translation, a pause button will appear, allowing you to click it and manually review and edit the translation results in the subtitle area on the right.