Skip to content

Adding New Models from Hugging Face

This document applies to the stt speech-to-text project: https://github.com/jianchang512/stt

From version 0.0.94 onwards, it is possible to add models compatible with faster-whisper/ctranslate2 from huggingface.co. This allows you to leverage specialized models, such as those trained for a specific language, to overcome the limitations of general-purpose models.

How to Add a Model

  1. Upgrade to version 0.0.94.

  2. Ensure you have access to and understand how to use a proxy server (i.e., you can "surf the web scientifically"). You need to know what a proxy and a proxy port are. If you don't meet this requirement, you should not attempt to add models, as accessing huggingface.co and downloading models both require a proxy.

  3. Search for the desired model on https://huggingface.co/models. Make sure the model is compatible with faster-whisper/ctranslate2; otherwise, it will not work.

    For example, I found this model: https://huggingface.co/zh-plus/faster-whisper-large-v2-japanese-5k-steps

    Converted from clu-ling/whisper-large-v2-japanese-5k-steps using CTranslate2.

    It is declared that ctranslate2 was used for conversion, so it can be used.

  4. As shown in the image above, click to copy the ID. Then, open the set.ini file in the software directory, find the model_list= line, add a comma at the end, paste the copied ID, and save the changes.

  5. Open the software, fill in the network proxy address, select the newly pasted name from the model list, and click "Start."

    If you are using a V2Ray-like software, the default proxy address is http://127.0.0.1:10809. If you are using a Clash-like software, the default proxy address is http://127.0.0.1:7890.

    Note: The selected video language must match the language supported by the model you added. If you select a Japanese model but choose a Chinese video, you will not get the expected results.

  6. After starting the process, if the model is not found locally during the subtitle recognition phase, it will automatically connect to huggingface.co to download it. Depending on your proxy situation, this may take a few minutes to tens of minutes. Please be patient.

    As long as there are no red error messages, the download is in progress. If red error messages appear, it is usually a proxy problem, such as a slow or unstable proxy. The error code generally contains Connection to huggingface.co timed out or a string of numbers such as 46573454354 indicating incomplete data.

    Note: If you deploy from source code, even if there is a proxy network error, it will only report errors like No such file xxxx.