Xiaohongshu has open-sourced a speech recognition project called FireRedASR, which excels in Chinese speech recognition. Previously, they only released a smaller AED model. Recently, they launched a larger LLM model, further improving recognition accuracy.
This ASR model is integrated into a package and can be conveniently used in the video translation software (pyVideoTrans).
Integration Package Download and Model Details
Model Sizes:
- AED Model (model.pth.tar): 4.35GB
- LLM Model: Includes two models
- Xiaohongshu Recognition Model (model.pth.tar): 3.37GB
- Qwen2-7B Model (4 files): Total 17GB
Total model size is approximately 21GB. Even when compressed into 7z format, the size exceeds 10GB. Due to size limitations, it cannot be uploaded to GitHub or cloud storage. Therefore, the integration package only includes the main program and does not contain any model files.
After downloading the integration package, please follow the steps below to separately download the model files and place them in the specified locations.
Note: The model files are hosted on huggingface.co, which is not directly accessible from within China. You will need a VPN to download them.
Main Integration Package Download
The main integration package is relatively small, 1.7GB. You can download it directly by opening the following link in your browser:
https://github.com/jianchang512/fireredasr-ui/releases/download/v0.3/fireredASR-2025-0224.7z
After downloading, extract the archive. You should see a file structure similar to the image below:

Download the AED Model
Downloading the AED model is straightforward; you only need to download one model file.
Download the
model.pth.tarfile.Download link:
https://huggingface.co/FireRedTeam/FireRedASR-AED-L/resolve/main/model.pth.tar?download=true
Place the downloaded
model.pth.tarfile into thepretrained_models/FireRedASR-AED-Lfolder within the integration package directory.
After downloading, the file location should look like this example:

Download the LLM Model
Downloading the LLM model is slightly more complex, requiring a total of 5 files (1 Xiaohongshu model + 4 Qwen2 model files).
1. Download the Xiaohongshu Model (model.pth.tar):
Download link: https://huggingface.co/FireRedTeam/FireRedASR-LLM-L/resolve/main/model.pth.tar?download=true
Place the downloaded
model.pth.tarfile into thepretrained_models/FireRedASR-LLM-Lfolder within the integration package. Please ensure the folder name containsLLMand do not place it in the wrong location.
The file location should look like this example:

2. Download the Qwen2 Model (4 files):
Download the files from the following 4 links separately and place them into the
pretrained_models/FireRedASR-LLM-L/Qwen2-7B-Instructfolder within the integration package.- https://huggingface.co/Qwen/Qwen2-7B-Instruct/resolve/main/model-00001-of-00004.safetensors?download=true
- https://huggingface.co/Qwen/Qwen2-7B-Instruct/resolve/main/model-00002-of-00004.safetensors?download=true
- https://huggingface.co/Qwen/Qwen2-7B-Instruct/resolve/main/model-00003-of-00004.safetensors?download=true
- https://huggingface.co/Qwen/Qwen2-7B-Instruct/resolve/main/model-00004-of-00004.safetensors?download=true
After downloading, the Qwen2-7B-Instruct folder should contain 4 files, as shown in the image below:

Launching the Integration Package
Once all model files are downloaded and correctly placed, double-click the 启动.bat file in the integration package directory to launch the program.
After the program starts, it will automatically open the address http://127.0.0.1:5078 in your browser. If you see the interface below, it means the program has started successfully and is ready to use.

Using in Video Translation Software
If you want to use the FireRedASR model in the pyVideoTrans video translation software, please follow these steps:
Ensure you have downloaded and placed the model files as described above and have successfully launched the integration package.
Open the pyVideoTrans software.
In the software menu, navigate to Menu -> Speech Recognition Settings -> OpenAI Speech Recognition & Compatible AI.
In the settings interface, fill in the relevant information as shown in the image below.

After filling in the details, click Save.
In the speech recognition channel selection, choose OpenAI Speech Recognition.

API Address:
Default Address: http://127.0.0.1:5078/v1
Usage with OpenAI SDK
from openai import OpenAI
client = OpenAI(api_key='123456',
base_url='http://127.0.0.1:5078/v1')
audio_file = open("5.wav", "rb")
transcript = client.audio.transcriptions.create(
model="whisper-1",
file=audio_file,
response_format="json",
timeout=86400
)
print(transcript.text)