Skip to content

SenseVoice is an open-source speech recognition basic model from Alibaba, supporting speech recognition in Chinese, Japanese, Korean, and English. Compared to some previous models, it features faster recognition speed and higher accuracy.

However, the officially released version has not included timestamp output, which is inconvenient for subtitle creation. Currently, I'm using other VAD models for pre-segmentation, and then using SenseVoice for recognition. I've created this API project and integrated it into video translation software for easier use.

SenseVoice Official Repository

This API Project https://github.com/jianchang512/sense-api

Project Functionality

  1. Replace the official api.py file to implement SRT subtitle output with timestamps.
  2. Connect to video translation and dubbing software for use.
  3. Comes with a Windows integration package, you can double-click run-api.bat to start the API, or double-click run-webui.bat to start the browser interface.

This api.py ignores emotion recognition and only supports Chinese, Japanese, Korean, and English speech recognition.

First, you need to deploy the SenseVoice project

  1. Deploy using the official source code method, supporting deployment on Windows/Linux/MacOSX. For specific tutorials, please refer to the SenseVoice project homepage: https://github.com/FunAudioLLM/SenseVoice. After deployment, download the api.py file from this project and overwrite the api.py file included in the official package (If you want to use it in video translation software, you must overwrite it, otherwise you will not get subtitles with timestamps).

  2. Deploy using the Windows integration package, only supporting deployment on Windows 10/11. Download the compressed package from the right side of this page: https://github.com/jianchang512/sense-api/releases, extract it, and double-click run-api.bat to use the API, or double-click run-webui.bat to open the web interface.

Using the API

The default API address is http://127.0.0.1:5000/asr

You can open the api.py file to modify it:

HOST='127.0.0.1'
PORT=5000
  1. If it is an official source code deployment, remember to overwrite the api.py file, and then execute python api.py.
  2. If it is a Windows integration package, double-click run-api.bat.
  3. Wait for http://127.0.0.1:5000 to appear in the terminal, indicating that the startup is successful and you can use it.

Note that the first time you use it, you will need to download the model from ModelScope, which will take a long time.

Using it in the Video Translation and Dubbing Tool

Fill in the API address in the menu -- Speech Recognition Settings - SenseVoice Speech Recognition window in the API address.

Calling the API in Source Code

  • API address: Assuming the default API address is http://127.0.0.1:5000
  • Calling method: POST
  • Request parameters:
    • lang: String type, can be passed in one of the four: zh | ja | ko | en
    • file: The audio binary data to be recognized, wav format
  • Return response:
    • If recognition is successful: {code:0,msg:ok,data:"Complete SRT subtitle format string"}
    • If recognition fails: {code:1,msg:"Reason for the error"}
    • Other internal errors: {detail:"Error information"}

Example: To recognize the 10.wav audio file, the spoken language in the file is Chinese.

import requests
res = requests.post(f"http://127.0.0.1:5000/asr", files={"file": open("c:/users/c1/videos/10s.wav", 'rb')},data={"lang":"zh"}, timeout=7200)
print(res.json())

Using WebUI in a Browser

  1. If it is the official package deployed by source code, execute python webui.py, and wait for the terminal to display http://127.0.0.1:7860, then enter the address in the browser to use it.
  2. If it is a Windows integration package, double-click run-webui.bat, and the browser will open automatically after the startup is successful.