Skip to content

Video translation software usually comes with a variety of speech recognition channels, which are used to transcribe human speech sounds in audio and video into subtitle files. In Chinese and English, these software effects are acceptable, but when used for minority languages ​​such as Japanese, Korean, and Indonesian, the effect is not very satisfactory.

This is because the training materials for foreign large language models are mainly English, and the effect of Chinese is not satisfactory. The training data of domestic models is also basically concentrated on Chinese and English, with Chinese accounting for a higher proportion.

The lack of training data leads to poor model effects. Fortunately, the Hugging Face website https://huggingface.co brings together massive fine-tuning models, including many fine-tuning models specifically for minority languages, and the effect is quite good.

This article will introduce how to use Hugging Face's models in video translation software to recognize minority languages, taking Japanese recognition as an example.

1. Scientific Internet Access

Due to network restrictions, the website https://huggingface.co cannot be directly accessed in China. You need to configure the network environment yourself to ensure that you can access the website.

After accessing, you will see the homepage of the Hugging Face website.

image.png

2. Enter the Models Directory

image.png

Click the "Automatic Speech Recognition" category in the left navigation bar, and all speech recognition models will be displayed on the right.

image.png

3. Find Models Compatible with faster-whisper

The Hugging Face website currently has 20,384 speech recognition models, but not all models are suitable for video translation software. Different models return different data formats, and video translation software only supports faster-whisper type models.

  • Enter "faster-whisper" in the search box to search.

image.png

The search results are basically models that can be used in video translation software.

Of course, some models are compatible with faster-whisper, but the name does not contain "faster-whisper". How to find these models?

  • Search for the language name, such as "japanese", and then click to enter the model details page to check whether the model introduction states that it is compatible with faster-whisper.

0-1.jpg

If the model name or introduction does not clearly mention faster-whisper, then the model is not available. Even if "whisper" "whisper-large" etc. appear, it is not available, because "whisper" is used to be compatible with openai-whisper mode, and the current video translation software does not support it yet. Will it be supported in the future? It depends.

image.png

4. Copy the Model ID to the Video Translation Software

After finding a suitable model, copy the model ID and paste it into "Menu" -> "Tools" -> "Advanced Options" -> "List of faster and openai models" in the video translation software.

  • Copy the model ID.0-0.jpg

  • Paste it into the video translation software.0.jpg

Save the settings.

5. Select faster-whisper Mode

In the speech recognition channel, select the model you just added. If it is not displayed, please restart the software.

1.jpg

After selecting the model and pronunciation language, you can start recognizing.

Note: You must set up a proxy, otherwise you will not be able to connect and an error will be reported. You can try setting up a computer global proxy or a system proxy. If the error persists, please fill in the proxy IP and port in the "Network Proxy" text box on the main interface.

For the explanation of the network proxy, please see https://pyvideotrans.com/proxy

image.pngimage.png

Depending on the network conditions, the download process may take a long time. As long as there is no red error, please wait patiently.

2-1.jpg