All-in-One Package Download Link 2: Download from HuggingFace.co
NVIDIA Parakeet Speech Transcription All-in-One Package User Guide
This all-in-one package integrates NVIDIA's open-source parakeet-ctc-1.1b
(English) and parakeet-tdt_ctc-0.6b-ja
(Japanese) speech recognition models. It is designed to transcribe audio and video files into SRT subtitle format.
Currently, there are few high-quality, open-source Japanese speech recognition models available. NVIDIA's parakeet-tdt_ctc-0.6b-ja
offers a reliable option for transcribing Japanese content.
A key feature of this tool is that it runs entirely on the user's local machine. No environment setup is required—just download, unzip, and double-click to run.
Features
- Local Transcription: Supports transcribing English and Japanese audio/video files into text.
- SRT Subtitle Generation: The transcription results can be directly generated as SRT subtitle files with timestamps.
How to Use
Step 1: Download and Launch the Program
Download the all-in-one package and unzip it. In the extracted folder, you will find the following file structure.
To run the program, double-click the file named 启动.bat
.
Step 2: Wait for the Models to Download
On the first run, the program will automatically download the required speech recognition models. A black command-line window will appear, showing the download progress.
The model files are large, so an internet connection is required for the download. The process may take some time, depending on your network speed. Once the download is complete, the program will automatically open the user interface in your default web browser.
Step 3: Upload a File and Start Transcription
After the program starts successfully, your browser will display the following interface.
The workflow is as follows:
- Select File: Click inside the dashed box or drag and drop your audio/video file into this area.
- Select Language: Choose "English" or "Japanese" from the dropdown menu based on the source file's language.
- Start Transcription: Click the "Start Transcription" button.
Once the task is complete, the generated SRT subtitle content will be displayed in the text box below and will be available for download.
API Usage Guide
For developers, this package provides a local API endpoint compatible with the OpenAI Speech to Text API. You can call the transcription function programmatically.
Python Example:
from openai import OpenAI
client = OpenAI(
base_url="http://127.0.0.1:5092/v1",
api_key="any-key", # api_key can be any string
)
# Read a local audio file
with open("your_audio.mp3", "rb") as audio_file:
# Request transcription
srt_result = client.audio.transcriptions.create(
model="parakeet", # The model name is fixed as "parakeet"
file=audio_file,
prompt="en", # Specify language: "en" for English, "ja" for Japanese
response_format="srt" # Specify SRT as the response format
)
print(srt_result)
Conclusion
This toolkit offers a localized solution for English and Japanese speech transcription. By following the steps above, users can easily convert audio and video files into subtitles on their own computers.