Using zh_recogn for Chinese Speech Recognition
This recognition method exclusively supports Chinese speech. It leverages the Alibaba ModelScope community model, offering superior performance for Chinese and addressing the limitations of international models in Chinese language processing.
How to Use
First, deploy the zh_recogn project.
Then, start it. Enter the address (default is http://127.0.0.1:9933) in the software's top-left menu: Settings - zh_recogn Chinese Speech Recognition - Address.
Next, select zh_recogn from the "faster mode" dropdown menu in the software interface. When this option is chosen, you no longer need to select a model or segmentation method.
Deploying the zh_recogn Project
Source Code Deployment
First, install Python 3.10 / install git, and install ffmpeg. On Windows, download ffmpeg.exe and place it in the ffmpeg folder of this project. On macOS, use
brew install ffmpeg
to install it.Create an empty directory with an English name. Open cmd in this directory on Windows (use Terminal on macOS and Linux) and execute the command
git clone https://github.com/jianchang512/zh_recogn ./
Continue by executing
python -m venv venv
. Then, execute.\venv\scripts\activate
in Windows, orsource ./venv/bin/activate
in macOS and Linux.Continue by executing
pip install -r requirements.txt --no-deps
If you need CUDA acceleration on Windows and Linux, continue by executing
pip uninstall torch torchaudio
, and then executepip install torch torchaudio --index-url https://download.pytorch.org/whl/cu118
Start the project by executing
python start.py
Pre-packaged Version (Win10/Win11 Only)
Download address: https://github.com/jianchang512/zh_recogn/releases
After downloading, extract the files to a directory with an English name and double-click start.exe.
To reduce the package size, the pre-packaged version does not support CUDA. If you need CUDA acceleration, please deploy from source code.
Using in the pyvideotrans Project
First, upgrade pyvideotrans to v1.62+. Then, open the zh_recogn Chinese Speech Recognition menu in the top-left settings menu and fill in the address and port, the default is "http://127.0.0.1:9933", do not add /api
at the end.
API
API address: http://ip:port/api, default: http://127.0.0.1:9933/api
Python code example for requesting the API:
import requests
audio_file="D:/audio/1.wav"
file={"audio":open(audio_file,'rb')}
res=requests.post("http://127.0.0.1:9933/api",files=file,timeout=1800)
print(res.data)
[
{
line:1,
time:"00:00:01,100 --> 00:00:03,300",
text:"Subtitle content 1"
},
{
line:2,
time:"00:00:04,100 --> 00:00:06,300",
text:"Subtitle content 2"
},
]
When filling in the address in pyvideotrans, do not add /api
at the end.
Web Interface
Precautions
- The first time you use it, the model will be downloaded automatically, which will take a long time.
- Only supports Chinese speech recognition.
- The binding address and port can be modified in the set.ini file.