Skip to content

SenseVoice is Alibaba's open-source speech recognition foundation model, capable of recognizing speech in Chinese, Japanese, Korean, and English. Compared to previous models, it offers faster recognition speed and higher accuracy.

However, the official release does not include built-in timestamp output, which makes it less convenient for subtitle creation. To address this, this API project was developed by using other VAD models for pre-segmentation and then applying SenseVoice for recognition. It has been integrated into video translation software for ease of use.

SenseVoice Official Repository

This API Project https://github.com/jianchang512/sense-api

Purpose of This Project

  1. Replaces the official api.py file to enable SRT subtitle output with timestamps.
  2. Connects with video translation and dubbing software.
  3. Includes a Windows integrated package: double-click run-api.bat to start the API or run-webui.bat to launch the browser interface.

This api.py ignores emotion recognition processing and only supports speech recognition for Chinese, Japanese, Korean, and English.

First, Deploy the SenseVoice Project

  1. Deploy using the official source code, compatible with Windows/Linux/MacOS. Refer to the SenseVoice project homepage for details: https://github.com/FunAudioLLM/SenseVoice. After deployment, download the api.py file from this project and replace the official api.py file (this replacement is mandatory for timestamped subtitles in video translation software).

  2. Deploy using the Windows integrated package, only for Windows 10/11. Download the compressed package from the right side of this page: https://github.com/jianchang512/sense-api/releases. After extraction, double-click run-api.bat to use the API or run-webui.bat to open the web interface.

Using the API

The default API address is http://127.0.0.1:5000/asr.

You can modify it in the api.py file:

HOST='127.0.0.1'
PORT=5000
  1. If deployed via official source code, remember to replace the api.py file and run python api.py.
  2. If using the Windows integrated package, simply double-click run-api.bat.
  3. Wait until the terminal displays http://127.0.0.1:5000, indicating successful startup and readiness for use.

Note: The first time you use it, the model will be downloaded from ModelScope, which may take some time.

Using in Video Translation and Dubbing Tools

Enter the API address in the menu under "Speech Recognition Settings" → "SenseVoice Speech Recognition" window.

Calling the API in Source Code

  • API Address: Assuming the default is http://127.0.0.1:5000
  • Method: POST
  • Request Parameters:
    • lang: String type, one of zh | ja | ko | en
    • file: Binary data of the audio file to recognize, in WAV format
  • Response:
    • On success:
    • On failure:
    • Other internal errors:

Example: To recognize the audio file 10.wav, where the spoken language is Chinese.

python
import requests
res = requests.post(f"http://127.0.0.1:5000/asr", files={"file": open("c:/users/c1/videos/10s.wav", 'rb')}, data={"lang":"zh"}, timeout=7200)
print(res.json())

Using WebUI in the Browser

  1. If deployed via official source code, run python webui.py. When the terminal shows http://127.0.0.1:7860, open this address in your browser.
  2. If using the Windows integrated package, double-click run-webui.bat. The browser will open automatically upon successful startup.