Skip to content

Using zh_recogn for Chinese Speech Recognition

This recognition method is exclusively for Chinese speech. It leverages a model from Alibaba's ModelScope community, providing superior performance for Chinese and effectively addressing the shortcomings of international models in handling Chinese audio.

How to Use

First, deploy the zh_recogn project.

Then, start it. Enter the address (default: http://127.0.0.1:9933) into the software's top-left menu: Settings -> zh_recogn Chinese Speech Recognition -> Address.

Then, in the software interface, select zh_recogn from the "Faster Mode" dropdown. When this option is chosen, there's no need to select a model or segmentation method.



Deploying the zh_recogn Project

Source Code Deployment

  1. First, install Python 3.10, install Git, and install FFmpeg. For Windows, download ffmpeg.exe and place it in the project's ffmpeg folder. For macOS, use brew install ffmpeg.

  2. Create an empty English-named directory. On Windows, open CMD in this directory (for macOS and Linux, use the terminal) and execute the command git clone https://github.com/jianchang512/zh_recogn ./.

  3. Next, execute python -m venv venv. Then, for Windows, run .\venv\scripts\activate; for macOS and Linux, run source ./venv/bin/activate.

  4. Continue by executing pip install -r requirements.txt --no-deps.

  5. For CUDA acceleration on Windows and Linux, proceed by executing pip uninstall torch torchaudio, then pip install torch torchaudio --index-url https://download.pytorch.org/whl/cu118.

  6. Start the project with python start.py.

Pre-packaged Version (Windows 10/11 Only)

Download link: https://github.com/jianchang512/zh_recogn/releases

  1. After downloading, extract it to an English-named directory and double-click start.exe.

  2. To reduce package size, the pre-packaged version does not support CUDA. If CUDA acceleration is required, please use the source code deployment method.

Using with the pyvideotrans Project

First, upgrade pyvideotrans to v1.62+. Then, open the top-left "Settings" menu -> "zh_recogn Chinese Speech Recognition" menu, and fill in the address and port. The default is "http://127.0.0.1:9933". Do not add /api at the end.

API

API address: http://ip:port/api (default: http://127.0.0.1:9933/api)

Python API request example:

import requests

audio_file="D:/audio/1.wav"
file={"audio":open(audio_file,'rb')}
res=requests.post("http://127.0.0.1:9933/api",files=file,timeout=1800)

print(res.data)

[
	{
	 line:1,
	 time:"00:00:01,100 --> 00:00:03,300",
	 text:"字幕内容1"
	},
	{
	 line:2,
	 time:"00:00:04,100 --> 00:00:06,300",
	 text:"字幕内容2"
	},
]

When filling this in for pyvideotrans, do not add /api at the end.

Web Interface

image

Important Notes

  1. The model will be downloaded automatically on first use, which may take some time.
  2. Only Chinese speech recognition is supported.
  3. The binding address and port can be modified in the set.ini file.