Xiaohongshu has open-sourced a speech auto-recognition project named FireRedASR, which performs excellently in Chinese speech recognition. Previously, they only open-sourced a smaller AED model. Recently, they released a larger LLM model, further improving recognition accuracy.
This ASR model has been integrated into the integrated package and can be easily used in video translation software (pyVideoTrans).
Integrated Package Download and Model Description
Model Size:
- AED Model (model.pth.tar): 4.35GB
- LLM Model: Includes two models
- Xiaohongshu Recognition Model (model.pth.tar): 3.37GB
- Qwen2-7B Model (4 files): Total 17GB
The total size of the models is approximately 21GB. Even compressed into 7z format, the size still exceeds 10GB. Due to size limitations, it cannot be uploaded to GitHub or cloud drives, so the integrated package only contains the main program body and does not include any model files.
Please download the integrated package and follow the steps below to download the model files separately and place them in the specified location.
Note: The model files are hosted on the huggingface.co website, which cannot be directly accessed in China. You need a VPN to download.
Integrated Package Body Download
The integrated package body is relatively small, 1.7G. You can directly open the following address in your browser to download:
https://github.com/jianchang512/fireredasr-ui/releases/download/v0.3/fireredASR-2025-0224.7z
After downloading and decompressing the package, you should see a file structure similar to the following:
Download AED Model
Downloading the AED model is relatively simple, requiring only one model file.
Download the
model.pth.tar
file.Download address:
https://huggingface.co/FireRedTeam/FireRedASR-AED-L/resolve/main/model.pth.tar?download=true
Place the downloaded
model.pth.tar
file into thepretrained_models/FireRedASR-AED-L
folder in the integrated package directory.
After downloading, the file location should be as follows:
Download LLM Model
Downloading the LLM model is slightly more complex, requiring a total of 5 files (1 Xiaohongshu model + 4 Qwen2 models).
1. Download the Xiaohongshu Model (model.pth.tar):
Download address: https://huggingface.co/FireRedTeam/FireRedASR-LLM-L/resolve/main/model.pth.tar?download=true
Place the downloaded
model.pth.tar
file into thepretrained_models/FireRedASR-LLM-L
folder in the integrated package. Make sure that the folder name containsLLM
and do not put it in the wrong location.
The file location should be as follows:
2. Download the Qwen2 Model (4 files):
Download the files from the following 4 links and place them into the
pretrained_models/FireRedASR-LLM-L/Qwen2-7B-Instruct
folder in the integrated package.- https://huggingface.co/Qwen/Qwen2-7B-Instruct/resolve/main/model-00001-of-00004.safetensors?download=true
- https://huggingface.co/Qwen/Qwen2-7B-Instruct/resolve/main/model-00002-of-00004.safetensors?download=true
- https://huggingface.co/Qwen/Qwen2-7B-Instruct/resolve/main/model-00003-of-00004.safetensors?download=true
- https://huggingface.co/Qwen/Qwen2-7B-Instruct/resolve/main/model-00004-of-00004.safetensors?download=true
After downloading, the Qwen2-7B-Instruct
folder should contain 4 files, as shown in the following figure:
Start the Integrated Package
When all model files have been downloaded and placed correctly, double-click the 启动.bat
file in the integrated package directory to start the program.
After the program starts, it will automatically open the address http://127.0.0.1:5078
in your browser. If you see the following interface, it means that the program has started successfully and you can start using it.
Using in Video Translation Software
If you want to use the FireRedASR model in the video translation software pyVideoTrans, follow these steps:
Make sure you have downloaded and placed the model files as described above, and have successfully started the integrated package.
Open the pyVideoTrans software.
In the software menu, select Menu -> Speech Recognition Settings -> OpenAI Speech Recognition and Compatible AI.
In the settings interface, fill in the relevant information as shown in the figure below.
After filling in the information, click Save.
In the speech recognition channel selection, select OpenAI Speech Recognition.
API Address:
Default address: http://127.0.0.1:5078/v1
Using with OpenAI SDK
from openai import OpenAI
client = OpenAI(api_key='123456',
base_url='http://127.0.0.1:5078/v1')
audio_file = open("5.wav", "rb")
transcript = client.audio.transcriptions.create(
model="whisper-1",
file=audio_file,
response_format="json",
timeout=86400
)
print(transcript.text)