SenseVoice is an open-source speech recognition foundation model by Alibaba that supports recognizing speech in Chinese, Japanese, Korean, and English. Compared to previous models, it boasts faster recognition speed and higher accuracy.
However, the official release doesn't include timestamp output, which is inconvenient for generating subtitles. Currently, I'm using other VAD models for pre-segmentation and then SenseVoice for recognition. This led to the creation of this API project, which is integrated into video translation software for ease of use.
SenseVoice Official Repository
This API Project: https://github.com/jianchang512/sense-api
Project Functionality
- Replaces the official
api.py
file to enable timestamped SRT subtitle output. - Connects to video translation and dubbing software for seamless integration.
- Includes a Windows integrated package. You can launch the API by double-clicking
run-api.bat
or start the browser interface by double-clickingrun-webui.bat
.
The
api.py
in this project omits emotion recognition processing and only supports the recognition of Chinese, Japanese, Korean, and English speech.
Deploying the SenseVoice Project
Deploy using the official source code method, which supports deployment on Windows, Linux, and MacOSX. Refer to the SenseVoice project page for specific tutorials: https://github.com/FunAudioLLM/SenseVoice. After deployment, download the
api.py
file from this project and overwrite theapi.py
file included in the official package (Required for use with video translation software; otherwise, timestamped subtitles will not be generated.).Deploy using the Windows integrated package, which only supports deployment on Windows 10/11. Download the compressed package from the right side of this page: https://github.com/jianchang512/sense-api/releases. After extracting, double-click
run-api.bat
to use the API, or double-clickrun-webui.bat
to open the web interface.
Using the API
The default API address is http://127.0.0.1:5000/asr
.
You can modify it by opening the api.py
file:
HOST='127.0.0.1'
PORT=5000
- If you deployed using the official source code, remember to overwrite the
api.py
file and then executepython api.py
. - If you are using the Windows integrated package, simply double-click
run-api.bat
. - Wait for
http://127.0.0.1:5000
to appear in the terminal, indicating successful startup. You can now use the API.
Note: The first time you use it, it will download the model from modelscope over the network, which may take a long time.
Using it in Video Translation and Dubbing Tools
Enter the API address in the menu -- Speech Recognition Settings - SenseVoice Speech Recognition window.
Calling the API in Source Code
- API address: Assume the default API address is
http://127.0.0.1:5000
- Calling method: POST
- Request parameters:
lang
: String type, can be one ofzh
,ja
,ko
, oren
.file
: Audio binary data in WAV format to be recognized.
- Response:
- Successful recognition returns:
{code:0,msg:ok,data:"Complete SRT subtitle format string"}
- Recognition failure returns:
{code:1,msg:"Reason for error"}
- Other internal errors return:
{detail:"Error information"}
- Successful recognition returns:
Example: Recognize the 10s.wav
audio file, where the spoken language is Chinese.
import requests
res = requests.post(f"http://127.0.0.1:5000/asr", files={"file": open("c:/users/c1/videos/10s.wav", 'rb')},data={"lang":"zh"}, timeout=7200)
print(res.json())
Using the Web UI in a Browser
- If you are using the official package deployed from source code, execute
python webui.py
. When the terminal displayshttp://127.0.0.1:7860
, enter this address in your browser. - If you are using the Windows integrated package, double-click
run-webui.bat
. The browser will open automatically after successful startup.