Skip to content

Xiaohongshu has open-sourced an automatic speech recognition project called FireRedASR, which performs excellently in Chinese speech recognition. Previously, they only open-sourced a smaller AED model. Recently, they released a larger LLM model, further improving recognition accuracy.

This ASR model has been integrated into an integrated package, making it easy to use in video translation software (pyVideoTrans).

Integrated Package Download and Model Description

Model Size:

  • AED Model (model.pth.tar): 4.35GB
  • LLM Model: Includes two models
    • Xiaohongshu Recognition Model (model.pth.tar): 3.37GB
    • Qwen2-7B Model (4 files): Total 17GB

The total size of the models is approximately 21GB. Even when compressed into 7z format, the size still exceeds 10GB. Due to size limitations, it cannot be uploaded to GitHub or cloud storage, so the integrated package only contains the main program and does not include any model files.

After downloading the integrated package, please follow these steps to download the model files separately and place them in the specified location.

Note: The model files are hosted on the huggingface.co website, which is not directly accessible in China. You will need a VPN to download them.

Integrated Package Main Body Download

The main body of the integrated package is relatively small, 1.7G. You can directly open the following address in your browser to download it:

https://github.com/jianchang512/fireredasr-ui/releases/download/v0.3/fireredASR-2025-0224.7z

After downloading and extracting the package, you should see a file structure similar to the following:

Download AED Model

Downloading the AED model is relatively simple, requiring only one model file to be downloaded.

  1. Download the model.pth.tar file.

    Download address:

    https://huggingface.co/FireRedTeam/FireRedASR-AED-L/resolve/main/model.pth.tar?download=true

  2. Place the downloaded model.pth.tar file in the pretrained_models/FireRedASR-AED-L folder within the integrated package directory.

After downloading, the file storage location should look like this:

Download LLM Model

Downloading the LLM model is slightly more complex, requiring a total of 5 files to be downloaded (1 Xiaohongshu model + 4 Qwen2 models).

1. Download Xiaohongshu Model (model.pth.tar):

The file storage location should look like this:

2. Download Qwen2 Model (4 files):

After downloading, the Qwen2-7B-Instruct folder should contain 4 files, as shown below:

Start the Integrated Package

After all model files have been downloaded and placed correctly, double-click the 启动.bat file in the integrated package directory to start the program.

After the program starts, it will automatically open the address http://127.0.0.1:5078 in your browser. If you see the following interface, it means that the program has started successfully and you can start using it.

Using in Video Translation Software

If you want to use the FireRedASR model in the video translation software pyVideoTrans, please follow these steps:

  1. Make sure you have downloaded and placed the model files according to the instructions above, and have successfully started the integrated package.

  2. Open the pyVideoTrans software.

  3. In the software menu, select Menu -> Speech Recognition Settings -> OpenAI Speech Recognition and Compatible AI.

  4. In the settings interface, fill in the relevant information as shown in the figure below.

  5. After filling in, click Save.

  6. In the speech recognition channel selection, select OpenAI Speech Recognition.

API Address:

Default address: http://127.0.0.1:5078/v1

Using with OpenAI SDK

from openai import OpenAI
client = OpenAI(api_key='123456',
    base_url='http://127.0.0.1:5078/v1')

audio_file = open("5.wav", "rb")
transcript = client.audio.transcriptions.create(
  model="whisper-1",
  file=audio_file,
  response_format="json",
  timeout=86400
)

print(transcript.text)