Speech-to-Text Tool
Speech-to-Text Tool Open Source Address
This is an offline, locally running speech-to-text tool based on the openai-whisper open-source model. It can recognize human voices in videos/audios and convert them into text. It can output in JSON format, SRT subtitle format with timestamps, and plain text format. It can be used to replace OpenAI's speech recognition interface or Baidu speech recognition, etc., after self-deployment. The accuracy is basically equivalent to the official OpenAI API interface.
After deployment or download, double-click start.exe to automatically open the local webpage in your local browser.
Drag and drop or click to select the audio and video files to be recognized, then select the speaking language, output text format, and the model to be used (base model is built-in). Click Start Recognition. After the recognition is completed, it will be output in the selected format on the current webpage.
The whole process does not require internet connection and runs completely locally. It can be deployed on the intranet.
The openai-whisper open-source model has base/small/medium/large/large-v3. The base model is built-in. The recognition effect of base->large-v3 is getting better and better, but it also requires more computer resources. You can download it yourself and put it in the models directory as needed.
Pre-compiled Win Version Usage / Linux and Mac Source Code Deployment
Click here to open the Releases page to download the pre-compiled file
After downloading, extract it to a certain place, such as E:/stt
Double-click start.exe and wait for the browser window to open automatically
Click the upload area on the page, find the audio or video file you want to recognize in the pop-up window, or directly drag the audio and video file to the upload area, then select the language spoken, the text output format, and the model to be used, click "Start Recognition Immediately", wait a moment, and the recognition result will be displayed in the bottom text box in the selected format.
If the machine has an Nvidia GPU and the CUDA environment is configured correctly, CUDA acceleration will be used automatically
Source Code Deployment (Linux/Mac/Window)
Requires python 3.9->3.11
Create an empty directory, such as E:/stt, open a cmd window in this directory, the method is to enter
cmd
in the address bar, and then press Enter.Use git to pull the source code to the current directory
git clone [email protected]:jianchang512/stt.git .
Create a virtual environment
python -m venv venv
Activate the environment, the command under win is
%cd%/venv/scripts/activate
, and the command under linux and Mac issource ./venv/bin/activate
Install dependencies:
pip install -r requirements.txt
, if a version conflict error is reported, please executepip install -r requirements.txt --no-deps
Under win, unzip ffmpeg.7z and put
ffmpeg.exe
andffprobe.exe
in the project directory. For linux and mac, go to the ffmpeg official website to download the corresponding version of ffmpeg, unzip theffmpeg
andffprobe
binary programs and put them in the project root directory.Download the model compressed package, download the model as needed, and put the xx.pt file in the compressed package into the models folder in the project root directory after downloading.
Execute
python start.py
, wait for the local browser window to open automatically.