Skip to content

Using zh_recogn Chinese Speech Recognition

This recognition method only supports Chinese speech. It uses the Alibaba ModelScope community model, which has good support for Chinese and can make up for the shortcomings of foreign models in supporting Chinese.

How to Use

First, deploy the zh_recogn project.

Then, start it. Fill in the address (default http://127.0.0.1:9933) in the upper left menu of the software - Settings - zh_recogn Chinese Speech Recognition - Address.

Then select zh_recogn in the "faster mode" drop-down box in the software interface. When this item is selected, there is no need to select the model and segmentation method again.



Deploying the zh_recogn Project

Source Code Deployment

  1. First install python3.10 / install git, install ffmpeg. On Windows, download ffmpeg.exe and put it in the ffmpeg folder of this project. On macOS, use brew install ffmpeg to install.

  2. Create an empty English directory. Open cmd in this directory on Windows (use terminal on macOS and Linux), and execute the command git clone https://github.com/jianchang512/zh_recogn ./

  3. Continue to execute python -m venv venv, then execute .\venv\scripts\activate in Windows, and execute source ./venv/bin/activate in macOS and Linux.

  4. Continue to execute pip install -r requirements.txt --no-deps

  5. If you need cuda acceleration on Windows and Linux, continue to execute pip uninstall torch torchaudio, then execute pip install torch torchaudio --index-url https://download.pytorch.org/whl/cu118

  6. Start the project python start.py

Pre-packaged Version / Win10 Win11 Only

Download address https://github.com/jianchang512/zh_recogn/releases

  1. After downloading, unzip it to an English directory and double-click start.exe

  2. To reduce the package size, the pre-packaged version does not support CUDA. If you need cuda acceleration, please deploy from source code.

Using in the pyvideotrans Project

First upgrade pyvideotrans to v1.62+, then open the upper left corner settings menu - zh_recogn Chinese Speech Recognition menu, fill in the address and port, default "http://127.0.0.1:9933", do not add /api at the end.

API

API address http://ip:prot/api default http://127.0.0.1:9933/api

Python code example for requesting the API

import requests

audio_file="D:/audio/1.wav"
file={"audio":open(audio_file,'rb')}
res=requests.post("http://127.0.0.1:9933/api",files=file,timeout=1800)

print(res.data)

[
	{
	 line:1,
	 time:"00:00:01,100 --> 00:00:03,300",
	 text:"Subtitle content 1"
	},
	{
	 line:2,
	 time:"00:00:04,100 --> 00:00:06,300",
	 text:"Subtitle content 2"
	},
]

Do not add /api at the end when filling in pyvideotrans.

Web Interface

image

Precautions

  1. The model will be downloaded automatically for the first use, which will take a long time.
  2. Only supports Chinese speech recognition.
  3. You can modify the binding address and port in the set.ini file.