Using zh_recogn Chinese Speech Recognition
This recognition method only supports Chinese speech. It uses the Alibaba ModelScope community model, which has good support for Chinese and can make up for the shortcomings of foreign models in supporting Chinese.
How to Use
First, deploy the zh_recogn project.
Then, start it. Fill in the address (default http://127.0.0.1:9933) in the upper left menu of the software - Settings - zh_recogn Chinese Speech Recognition - Address.
Then select zh_recogn in the "faster mode" drop-down box in the software interface. When this item is selected, there is no need to select the model and segmentation method again.
Deploying the zh_recogn Project
Source Code Deployment
First install python3.10 / install git, install ffmpeg. On Windows, download ffmpeg.exe and put it in the ffmpeg folder of this project. On macOS, use
brew install ffmpeg
to install.Create an empty English directory. Open cmd in this directory on Windows (use terminal on macOS and Linux), and execute the command
git clone https://github.com/jianchang512/zh_recogn ./
Continue to execute
python -m venv venv
, then execute.\venv\scripts\activate
in Windows, and executesource ./venv/bin/activate
in macOS and Linux.Continue to execute
pip install -r requirements.txt --no-deps
If you need cuda acceleration on Windows and Linux, continue to execute
pip uninstall torch torchaudio
, then executepip install torch torchaudio --index-url https://download.pytorch.org/whl/cu118
Start the project
python start.py
Pre-packaged Version / Win10 Win11 Only
Download address https://github.com/jianchang512/zh_recogn/releases
After downloading, unzip it to an English directory and double-click start.exe
To reduce the package size, the pre-packaged version does not support CUDA. If you need cuda acceleration, please deploy from source code.
Using in the pyvideotrans Project
First upgrade pyvideotrans to v1.62+, then open the upper left corner settings menu - zh_recogn Chinese Speech Recognition menu, fill in the address and port, default "http://127.0.0.1:9933", do not add /api
at the end.
API
API address http://ip:prot/api default http://127.0.0.1:9933/api
Python code example for requesting the API
import requests
audio_file="D:/audio/1.wav"
file={"audio":open(audio_file,'rb')}
res=requests.post("http://127.0.0.1:9933/api",files=file,timeout=1800)
print(res.data)
[
{
line:1,
time:"00:00:01,100 --> 00:00:03,300",
text:"Subtitle content 1"
},
{
line:2,
time:"00:00:04,100 --> 00:00:06,300",
text:"Subtitle content 2"
},
]
Do not add /api
at the end when filling in pyvideotrans.
Web Interface
Precautions
- The model will be downloaded automatically for the first use, which will take a long time.
- Only supports Chinese speech recognition.
- You can modify the binding address and port in the set.ini file.