Skip to content

Using zh_recogn for Chinese Speech Recognition

This recognition method exclusively supports Chinese speech. It leverages the Alibaba ModelScope community model, offering superior performance for Chinese and addressing the limitations of international models in Chinese language processing.

How to Use

First, deploy the zh_recogn project.

Then, start it. Enter the address (default is http://127.0.0.1:9933) in the software's top-left menu: Settings - zh_recogn Chinese Speech Recognition - Address.

Next, select zh_recogn from the "faster mode" dropdown menu in the software interface. When this option is chosen, you no longer need to select a model or segmentation method.



Deploying the zh_recogn Project

Source Code Deployment

  1. First, install Python 3.10 / install git, and install ffmpeg. On Windows, download ffmpeg.exe and place it in the ffmpeg folder of this project. On macOS, use brew install ffmpeg to install it.

  2. Create an empty directory with an English name. Open cmd in this directory on Windows (use Terminal on macOS and Linux) and execute the command git clone https://github.com/jianchang512/zh_recogn ./

  3. Continue by executing python -m venv venv. Then, execute .\venv\scripts\activate in Windows, or source ./venv/bin/activate in macOS and Linux.

  4. Continue by executing pip install -r requirements.txt --no-deps

  5. If you need CUDA acceleration on Windows and Linux, continue by executing pip uninstall torch torchaudio, and then execute pip install torch torchaudio --index-url https://download.pytorch.org/whl/cu118

  6. Start the project by executing python start.py

Pre-packaged Version (Win10/Win11 Only)

Download address: https://github.com/jianchang512/zh_recogn/releases

  1. After downloading, extract the files to a directory with an English name and double-click start.exe.

  2. To reduce the package size, the pre-packaged version does not support CUDA. If you need CUDA acceleration, please deploy from source code.

Using in the pyvideotrans Project

First, upgrade pyvideotrans to v1.62+. Then, open the zh_recogn Chinese Speech Recognition menu in the top-left settings menu and fill in the address and port, the default is "http://127.0.0.1:9933", do not add /api at the end.

API

API address: http://ip:port/api, default: http://127.0.0.1:9933/api

Python code example for requesting the API:

import requests

audio_file="D:/audio/1.wav"
file={"audio":open(audio_file,'rb')}
res=requests.post("http://127.0.0.1:9933/api",files=file,timeout=1800)

print(res.data)

[
	{
	 line:1,
	 time:"00:00:01,100 --> 00:00:03,300",
	 text:"Subtitle content 1"
	},
	{
	 line:2,
	 time:"00:00:04,100 --> 00:00:06,300",
	 text:"Subtitle content 2"
	},
]

When filling in the address in pyvideotrans, do not add /api at the end.

Web Interface

image

Precautions

  1. The first time you use it, the model will be downloaded automatically, which will take a long time.
  2. Only supports Chinese speech recognition.
  3. The binding address and port can be modified in the set.ini file.