XiaoHongShu has open-sourced a speech automatic recognition project called FireRedASR, which performs well in Chinese speech recognition. Previously, they only open-sourced a smaller AED model. Recently, they released a larger LLM model, which further improved the recognition accuracy.
This ASR model has been integrated into the integrated package, and can be easily used in the video translation software (pyVideoTrans).
Integrated Package Download and Model Description
Model Volume:
- AED Model (model.pth.tar): 4.35GB
- LLM Model: Contains two models
- XiaoHongShu Recognition Model (model.pth.tar): 3.37GB
- Qwen2-7B Model (4 files): Total 17GB
The total model is about 21GB. Even if compressed into 7z format, the volume still exceeds 10GB. Due to volume limitations, it cannot be uploaded to GitHub or network disk, so the integrated package only contains the program body and does not contain any model files.
Please download the integrated package, and then download the model files separately according to the following steps, and put them in the specified location.
Note: The model files are hosted on the huggingface.co website, which cannot be directly accessed in China. You need to use a VPN to download.
Integrated Package Main Body Download
The integrated package main body volume is relatively small, 1.7G. You can directly open the following address in your browser to download:
https://github.com/jianchang512/fireredasr-ui/releases/download/v0.3/fireredASR-2025-0224.7z
After the download is complete, extract the compressed package, you should see a file structure similar to the following:
Download AED Model
The download of the AED model is relatively simple, you only need to download one model file.
Download the
model.pth.tar
file.Download address:
https://huggingface.co/FireRedTeam/FireRedASR-AED-L/resolve/main/model.pth.tar?download=true
Put the downloaded
model.pth.tar
file into thepretrained_models/FireRedASR-AED-L
folder under the integrated package directory.
After the download is complete, the file storage location is as follows:
Download LLM Model
The download of the LLM model is a bit more complicated, you need to download a total of 5 files (1 XiaoHongShu model + 4 Qwen2 models).
1. Download XiaoHongShu Model (model.pth.tar):
Download address: https://huggingface.co/FireRedTeam/FireRedASR-LLM-L/resolve/main/model.pth.tar?download=true
Put the downloaded
model.pth.tar
file into thepretrained_models/FireRedASR-LLM-L
folder of the integrated package. Please be sure to pay attention to the folder name containingLLM
, do not put it in the wrong place.
The file storage location is as follows:
2. Download Qwen2 Model (4 files):
Download the files from the following 4 links respectively, and put them into the
pretrained_models/FireRedASR-LLM-L/Qwen2-7B-Instruct
folder of the integrated package.- https://huggingface.co/Qwen/Qwen2-7B-Instruct/resolve/main/model-00001-of-00004.safetensors?download=true
- https://huggingface.co/Qwen/Qwen2-7B-Instruct/resolve/main/model-00002-of-00004.safetensors?download=true
- https://huggingface.co/Qwen/Qwen2-7B-Instruct/resolve/main/model-00003-of-00004.safetensors?download=true
- https://huggingface.co/Qwen/Qwen2-7B-Instruct/resolve/main/model-00004-of-00004.safetensors?download=true
After the download is complete, the Qwen2-7B-Instruct
folder should contain 4 files, as shown in the following figure:
Start Integrated Package
When all model files are downloaded and placed correctly, double-click to run the 启动.bat
file in the integrated package directory to start the program.
After the program starts, it will automatically open the address http://127.0.0.1:5078
in the browser. If you see the following interface, it means that the program has been successfully started and you can start using it.
Use in Video Translation Software
If you want to use the FireRedASR model in the video translation software pyVideoTrans, please follow these steps:
Make sure you have downloaded and placed the model files according to the above instructions, and have successfully started the integrated package.
Open pyVideoTrans software.
In the software menu, select Menu -> Speech Recognition Settings -> OpenAI Speech Recognition and Compatible AI.
In the settings interface, fill in the relevant information as shown in the figure below.
After filling in, click Save.
In the speech recognition channel selection, select OpenAI Speech Recognition.
API Address:
Default address: http://127.0.0.1:5078/v1
Use in OpenAI SDK
from openai import OpenAI
client = OpenAI(api_key='123456',
base_url='http://127.0.0.1:5078/v1')
audio_file = open("5.wav", "rb")
transcript = client.audio.transcriptions.create(
model="whisper-1",
file=audio_file,
response_format="json",
timeout=86400
)
print(transcript.text)