SenseVoice is an open-source speech recognition basic model from Alibaba, supporting speech recognition in Chinese, Japanese, Korean, and English. Compared to some previous models, it features faster recognition speed and higher accuracy.
However, the officially released version has not included timestamp output, which is inconvenient for subtitle creation. Currently, I'm using other VAD models for pre-segmentation, and then using SenseVoice for recognition. I've created this API project and integrated it into video translation software for easier use.
SenseVoice Official Repository
This API Project https://github.com/jianchang512/sense-api
Project Functionality
- Replace the official
api.py
file to implement SRT subtitle output with timestamps. - Connect to video translation and dubbing software for use.
- Comes with a Windows integration package, you can double-click
run-api.bat
to start the API, or double-clickrun-webui.bat
to start the browser interface.
This
api.py
ignores emotion recognition and only supports Chinese, Japanese, Korean, and English speech recognition.
First, you need to deploy the SenseVoice project
Deploy using the official source code method, supporting deployment on Windows/Linux/MacOSX. For specific tutorials, please refer to the SenseVoice project homepage: https://github.com/FunAudioLLM/SenseVoice. After deployment, download the
api.py
file from this project and overwrite theapi.py
file included in the official package (If you want to use it in video translation software, you must overwrite it, otherwise you will not get subtitles with timestamps).Deploy using the Windows integration package, only supporting deployment on Windows 10/11. Download the compressed package from the right side of this page: https://github.com/jianchang512/sense-api/releases, extract it, and double-click
run-api.bat
to use the API, or double-clickrun-webui.bat
to open the web interface.
Using the API
The default API address is http://127.0.0.1:5000/asr
You can open the api.py
file to modify it:
HOST='127.0.0.1'
PORT=5000
- If it is an official source code deployment, remember to overwrite the
api.py
file, and then executepython api.py
. - If it is a Windows integration package, double-click
run-api.bat
. - Wait for
http://127.0.0.1:5000
to appear in the terminal, indicating that the startup is successful and you can use it.
Note that the first time you use it, you will need to download the model from ModelScope, which will take a long time.
Using it in the Video Translation and Dubbing Tool
Fill in the API address in the menu -- Speech Recognition Settings - SenseVoice Speech Recognition window in the API address.
Calling the API in Source Code
- API address: Assuming the default API address is
http://127.0.0.1:5000
- Calling method: POST
- Request parameters:
- lang: String type, can be passed in one of the four: zh | ja | ko | en
- file: The audio binary data to be recognized, wav format
- Return response:
- If recognition is successful:
{code:0,msg:ok,data:"Complete SRT subtitle format string"}
- If recognition fails:
{code:1,msg:"Reason for the error"}
- Other internal errors:
{detail:"Error information"}
- If recognition is successful:
Example: To recognize the 10.wav audio file, the spoken language in the file is Chinese.
import requests
res = requests.post(f"http://127.0.0.1:5000/asr", files={"file": open("c:/users/c1/videos/10s.wav", 'rb')},data={"lang":"zh"}, timeout=7200)
print(res.json())
Using WebUI in a Browser
- If it is the official package deployed by source code, execute
python webui.py
, and wait for the terminal to displayhttp://127.0.0.1:7860
, then enter the address in the browser to use it. - If it is a Windows integration package, double-click
run-webui.bat
, and the browser will open automatically after the startup is successful.