Parakeet-API: High-Performance Local Speech-to-Text Service
The parakeet-api project is a local speech-to-text service built upon the NVIDIA Parakeet-tdt-0.6b model. It features an OpenAI API
-compatible interface and a clean Web UI, allowing you to effortlessly and rapidly convert any audio or video file into high-accuracy SRT subtitles. It's also compatible with pyVideoTrans v3.72+
.
Project Open Source Address: https://github.com/jianchang512/parakeet-api
✨ Key Advantages of Parakeet-API
- 🚀 Extreme Speed and Performance: The Parakeet model is highly optimized, especially with NVIDIA GPUs, providing incredibly fast transcription speeds, making it ideal for handling large or lengthy audio and video files.
- 🎯 Precise Timestamps: Utilizing advanced Transducer (TDT) technology, the generated SRT subtitles have very accurate timestamps, aligning perfectly with the audio stream, perfect for video subtitling.
- 💰 Completely Free, Unlimited Usage: Run on your own hardware without any API call costs or usage time restrictions.
- 🌐 Flexible Access Methods: Provides an intuitive Web interface and standardized API interface, easily integrated into existing workflows such as
pyVideoTrans
.
🛠️ Installation and Configuration Guide
This project supports Windows, macOS, and Linux. Please follow these steps for installation and configuration.
Step 0: Configure python3.10 Environment
If you don't have python3 installed, please follow this tutorial: https://pvt9.com/_posts/pythoninstall
Step 1: Prepare FFmpeg
This project uses ffmpeg
for audio and video format preprocessing.
Windows (Recommended):
- Download from the FFmpeg github repository. After extraction, you will get
ffmpeg.exe
. - Place the downloaded
ffmpeg.exe
file directly in the root directory of the project (at the same level as theapp.py
file). The program will automatically detect and use it without configuring environment variables.
- Download from the FFmpeg github repository. After extraction, you will get
macOS (using Homebrew):
bashbrew install ffmpeg
Linux (Debian/Ubuntu):
bashsudo apt update && sudo apt install ffmpeg
Step 2: Create a Python Virtual Environment and Install Dependencies
Download or clone the project code to your local computer (it is recommended to place it in a non-system drive, in an English or numeric folder).
Open a terminal or command-line tool and go to the root directory of the project (on Windows, just enter
cmd
in the folder address bar and press Enter).Create a virtual environment:
python -m venv venv
Activate the virtual environment:
- Windows (CMD/PowerShell):
.\venv\Scripts\activate
- macOS / Linux (Bash/Zsh):
source venv/bin/activate
- Windows (CMD/PowerShell):
Install Dependencies:
If you don't have an NVIDIA graphics card (CPU only):
bashpip install -r requirements.txt
If you have an NVIDIA graphics card (GPU acceleration): a. Make sure you have installed the latest NVIDIA driver and the corresponding CUDA Toolkit. b. Uninstall any existing PyTorch versions:
pip uninstall -y torch
c. Install PyTorch that matches your CUDA version (using CUDA 12.6 as an example):bashpip install torch --index-url https://download.pytorch.org/whl/cu126
Step 3: Start the Service
In the terminal with the virtual environment activated, run the following command:
python app.py
You will see a prompt that the service has started. The first run will download the model (approximately 1.2GB), please be patient.
If there are a lot of prompts, don't worry about it.
Successful Startup Interface
🚀 How to Use
Method 1: Using the Web Interface
- Open in your browser: http://127.0.0.1:5092
- Drag and drop or click to upload your audio or video file.
- Click "Start Transcription" and wait for the processing to complete. You can then see and download the SRT subtitles below.
Method 2: API Call (Python Example)
Use the openai
library to easily call this service.
from openai import OpenAI
client = OpenAI(
base_url="http://127.0.0.1:5092/v1",
api_key="any-key",
)
with open("your_audio.mp3", "rb") as audio_file:
srt_result = client.audio.transcriptions.create(
model="parakeet",
file=audio_file,
response_format="srt"
)
print(srt_result)
Method 3: Integrate with pyVideoTrans (Recommended)
Parakeet-API can be seamlessly integrated with the video translation tool pyVideoTrans
(version 3.72 and above).
- Make sure your
parakeet-api
service is running locally. - Open the
pyVideoTrans
software. - In the menu bar, select Speech Recognition(R) -> Nvidia parakeet-tdt.
- In the pop-up configuration window, set the "http address" to:
http://127.0.0.1:5092/v1
- Click "Save" to start using it.