Using zh_recogn for Chinese Speech Recognition
This recognition method is exclusively for Chinese speech. It leverages a model from Alibaba's ModelScope community, providing superior performance for Chinese and effectively addressing the shortcomings of international models in handling Chinese audio.
How to Use
First, deploy the zh_recogn project.
Then, start it. Enter the address (default: http://127.0.0.1:9933) into the software's top-left menu: Settings -> zh_recogn Chinese Speech Recognition -> Address.
Then, in the software interface, select zh_recogn from the "Faster Mode" dropdown. When this option is chosen, there's no need to select a model or segmentation method.
Deploying the zh_recogn Project
Source Code Deployment
First, install Python 3.10, install Git, and install FFmpeg. For Windows, download
ffmpeg.exe
and place it in the project'sffmpeg
folder. For macOS, usebrew install ffmpeg
.Create an empty English-named directory. On Windows, open CMD in this directory (for macOS and Linux, use the terminal) and execute the command
git clone https://github.com/jianchang512/zh_recogn ./
.Next, execute
python -m venv venv
. Then, for Windows, run.\venv\scripts\activate
; for macOS and Linux, runsource ./venv/bin/activate
.Continue by executing
pip install -r requirements.txt --no-deps
.For CUDA acceleration on Windows and Linux, proceed by executing
pip uninstall torch torchaudio
, thenpip install torch torchaudio --index-url https://download.pytorch.org/whl/cu118
.Start the project with
python start.py
.
Pre-packaged Version (Windows 10/11 Only)
Download link: https://github.com/jianchang512/zh_recogn/releases
After downloading, extract it to an English-named directory and double-click
start.exe
.To reduce package size, the pre-packaged version does not support CUDA. If CUDA acceleration is required, please use the source code deployment method.
Using with the pyvideotrans Project
First, upgrade pyvideotrans to v1.62+. Then, open the top-left "Settings" menu -> "zh_recogn Chinese Speech Recognition" menu, and fill in the address and port. The default is "http://127.0.0.1:9933". Do not add /api
at the end.
API
API address: http://ip:port/api (default: http://127.0.0.1:9933/api
)
Python API request example:
import requests
audio_file="D:/audio/1.wav"
file={"audio":open(audio_file,'rb')}
res=requests.post("http://127.0.0.1:9933/api",files=file,timeout=1800)
print(res.data)
[
{
line:1,
time:"00:00:01,100 --> 00:00:03,300",
text:"字幕内容1"
},
{
line:2,
time:"00:00:04,100 --> 00:00:06,300",
text:"字幕内容2"
},
]
When filling this in for pyvideotrans, do not add /api
at the end.
Web Interface
Important Notes
- The model will be downloaded automatically on first use, which may take some time.
- Only Chinese speech recognition is supported.
- The binding address and port can be modified in the
set.ini
file.