Skip to content

Video translation software often comes with various speech recognition channels for transcribing human speech in audio and video into subtitle files. While these software perform adequately in English and Chinese, their performance tends to be less satisfactory when used with less common languages such as Japanese, Korean, and Indonesian.

This is because the training data for large language models is primarily in English, and the performance in Chinese is also not ideal. Similarly, the training data for domestic models mainly focuses on Chinese and English, with a higher proportion of Chinese.

The lack of training data leads to poor model performance. Fortunately, the Hugging Face website https://huggingface.co hosts a vast collection of fine-tuned models, including those specifically designed for less common languages, which perform quite well.

This article will introduce how to use Hugging Face models in video translation software to recognize less common languages, using Japanese as an example.

1. Scientific Internet Access

Due to network restrictions, direct access to the https://huggingface.co website is not possible in China. You need to configure your network environment to ensure you can access the website.

After accessing the website, you will see the Hugging Face homepage.

image.png

2. Go to the Models Directory

image.png

Click on the "Automatic Speech Recognition" category in the left navigation bar, and all speech recognition models will be displayed on the right.

image.png

3. Find Models Compatible with faster-whisper

The Hugging Face website currently has 20,384 speech recognition models, but not all of them are suitable for video translation software. Different models return different data formats, and video translation software is only compatible with faster-whisper type models.

  • Enter "faster-whisper" in the search box to search.

image.png

The search results are mostly models that can be used in video translation software.

Of course, some models are compatible with faster-whisper but do not contain "faster-whisper" in their name. How to find these models?

  • Search for the language name, such as "japanese", and then click to enter the model details page to see if the model description states that it is compatible with faster-whisper.

0-1.jpg

If the model name or description does not explicitly mention faster-whisper, the model is not usable. Even if "whisper" or "whisper-large" appear, they are not usable because "whisper" is used to support the openai-whisper mode, which the current video translation software does not yet support. Whether it will be supported in the future depends on the situation.

image.png

4. Copy the Model ID to the Video Translation Software

After finding a suitable model, copy the model ID and paste it into the "Menu" -> "Tools" -> "Advanced Options" -> "faster and openai model list" in the video translation software.

  • Copy the model ID.0-0.jpg

  • Paste it into the video translation software.0.jpg

Save the settings.

5. Select the faster-whisper Mode

In the speech recognition channel, select the model you just added. If it is not displayed, please restart the software.

1.jpg

After selecting the model and pronunciation language, you can start the recognition.

Note: A proxy must be set, otherwise the connection will fail and an error will occur. You can try setting a global computer proxy or a system proxy. If the error persists, please fill in the proxy IP and port in the "Network Proxy" text box on the main interface.

For an explanation of network proxies, please see https://pyvideotrans.com/proxy

image.pngimage.png

Depending on the network conditions, the download process may take a long time. As long as no red error appears, please be patient.

2-1.jpg