SenseVoice is Alibaba's open-source speech recognition foundation model, capable of recognizing speech in Chinese, Japanese, Korean, and English. Compared to previous models, it offers faster recognition speed and higher accuracy.
However, the official release does not include built-in timestamp output, which makes it less convenient for subtitle creation. To address this, this API project was developed by using other VAD models for pre-segmentation and then applying SenseVoice for recognition. It has been integrated into video translation software for ease of use.
SenseVoice Official Repository
This API Project https://github.com/jianchang512/sense-api
Purpose of This Project 
- Replaces the official api.pyfile to enable SRT subtitle output with timestamps.
- Connects with video translation and dubbing software.
- Includes a Windows integrated package: double-click run-api.batto start the API orrun-webui.batto launch the browser interface.
This
api.pyignores emotion recognition processing and only supports speech recognition for Chinese, Japanese, Korean, and English.
First, Deploy the SenseVoice Project 
- Deploy using the official source code, compatible with Windows/Linux/MacOS. Refer to the SenseVoice project homepage for details: https://github.com/FunAudioLLM/SenseVoice. After deployment, download the - api.pyfile from this project and replace the official- api.pyfile (this replacement is mandatory for timestamped subtitles in video translation software).
- Deploy using the Windows integrated package, only for Windows 10/11. Download the compressed package from the right side of this page: https://github.com/jianchang512/sense-api/releases. After extraction, double-click - run-api.batto use the API or- run-webui.batto open the web interface.
Using the API 
The default API address is http://127.0.0.1:5000/asr.
You can modify it in the api.py file:
HOST='127.0.0.1'
PORT=5000- If deployed via official source code, remember to replace the api.pyfile and runpython api.py.
- If using the Windows integrated package, simply double-click run-api.bat.
- Wait until the terminal displays http://127.0.0.1:5000, indicating successful startup and readiness for use.

Note: The first time you use it, the model will be downloaded from ModelScope, which may take some time.
Using in Video Translation and Dubbing Tools 
Enter the API address in the menu under "Speech Recognition Settings" → "SenseVoice Speech Recognition" window.

Calling the API in Source Code 
- API Address: Assuming the default is http://127.0.0.1:5000
- Method: POST
- Request Parameters: - lang: String type, one of zh | ja | ko | en
- file: Binary data of the audio file to recognize, in WAV format
 
- Response: - On success:
- On failure:
- Other internal errors:
 
Example: To recognize the audio file 10.wav, where the spoken language is Chinese.
import requests
res = requests.post(f"http://127.0.0.1:5000/asr", files={"file": open("c:/users/c1/videos/10s.wav", 'rb')}, data={"lang":"zh"}, timeout=7200)
print(res.json())Using WebUI in the Browser 
- If deployed via official source code, run python webui.py. When the terminal showshttp://127.0.0.1:7860, open this address in your browser.
- If using the Windows integrated package, double-click run-webui.bat. The browser will open automatically upon successful startup.

