Xiaohongshu's Open-Source Speech Recognition Model Usage and Integration Package Download | pyVideoTrans-Open Source Video Translation Tool -pyvideotrans.com github.com/jianchang512/pyvideotrans

Xiaohongshu has open-sourced a speech automatic recognition project called FireRedASR, which performs excellently in Chinese speech recognition. Previously, they only open-sourced a smaller AED model. Recently, they released a larger LLM model, further improving recognition accuracy.

This ASR model has been integrated into an integration package and can be conveniently used in video translation software (pyVideoTrans).

Integration Package Download and Model Description

Model Size:

AED Model (model.pth.tar): 4.35GB
LLM Model: Includes two models
- Xiaohongshu Recognition Model (model.pth.tar): 3.37GB
- Qwen2-7B Model (4 files): Total 17GB

Total model size is approximately 21GB. Even when compressed in 7z format, the size still exceeds 10GB. Due to size limitations, it cannot be uploaded to GitHub or cloud drives. Therefore, the integration package only includes the main program and does not contain any model files.

Please download the integration package and follow the steps below to download the model files separately and place them in the specified location.

Note: The model files are hosted on the huggingface.co website. This website is not directly accessible in some regions, and you may need a VPN to download.

Integration Package Main Body Download

The main body of the integration package is relatively small, 1.7G. You can directly open the following address in your browser to download:

https://github.com/jianchang512/fireredasr-ui/releases/download/v0.3/fireredASR-2025-0224.7z

After downloading and extracting the package, you should see a file structure similar to the following:

Download AED Model

Downloading the AED model is relatively simple, requiring only one model file.

Download the model.pth.tar file.
Download address:
https://huggingface.co/FireRedTeam/FireRedASR-AED-L/resolve/main/model.pth.tar?download=true
Place the downloaded model.pth.tar file in the pretrained_models/FireRedASR-AED-L folder within the integration package directory.

After downloading, the file storage location should look like this:

Download LLM Model

Downloading the LLM model is slightly more complex, requiring the download of a total of 5 files (1 Xiaohongshu model + 4 Qwen2 models).

1. Download Xiaohongshu Model (model.pth.tar):

Download address: https://huggingface.co/FireRedTeam/FireRedASR-LLM-L/resolve/main/model.pth.tar?download=true
Place the downloaded model.pth.tar file in the pretrained_models/FireRedASR-LLM-L folder within the integration package. Make sure the folder name contains LLM and do not place it in the wrong location.

File storage location example:

2. Download Qwen2 Model (4 files):

Download the files from the following 4 links separately and place them in the pretrained_models/FireRedASR-LLM-L/Qwen2-7B-Instruct folder within the integration package.

After downloading, the Qwen2-7B-Instruct folder should contain 4 files, as shown in the following figure:

Launch the Integration Package

After all model files are downloaded and placed correctly, double-click the 启动.bat file in the integration package directory to launch the program.

After the program starts, it will automatically open the address http://127.0.0.1:5078 in your browser. If you see the following interface, it means the program has started successfully and you can start using it.

Using in Video Translation Software

If you want to use the FireRedASR model in the video translation software pyVideoTrans, follow these steps:

Make sure you have downloaded and placed the model files as described above, and have successfully launched the integration package.
Open the pyVideoTrans software.
In the software menu, select Menu -> Speech Recognition Settings -> OpenAI Speech Recognition and Compatible AI.
In the settings interface, fill in the relevant information as shown in the figure below.
Click Save after filling in the information.
In the speech recognition channel selection, select OpenAI Speech Recognition.

API Address:

Default address: http://127.0.0.1:5078/v1

Using with OpenAI SDK

from openai import OpenAI
client = OpenAI(api_key='123456',
    base_url='http://127.0.0.1:5078/v1')

audio_file = open("5.wav", "rb")
transcript = client.audio.transcriptions.create(
  model="whisper-1",
  file=audio_file,
  response_format="json",
  timeout=86400
)

print(transcript.text)

Integration Package Download and Model Description ​

Integration Package Main Body Download ​

Download AED Model ​

Download LLM Model ​

Launch the Integration Package ​

Using in Video Translation Software ​

API Address: ​