Skip to content

Download Link 1: Download from Baidu Netdisk

Download Link 2: Download from HuggingFace.co

NVIDIA Parakeet Speech Transcription All-in-One Package User Guide

This package integrates two NVIDIA open-source speech recognition models: parakeet-ctc-1.1b (English) and parakeet-tdt_ctc-0.6b-ja (Japanese), designed to transcribe audio and video files into SRT subtitle format.

Currently, there are limited high-quality open-source Japanese speech recognition models available. NVIDIA's parakeet-tdt_ctc-0.6b-ja offers a reliable option for transcribing Japanese content.

This tool runs entirely on your local computer, requiring no environment setup—just download, extract, and double-click to use.

Features

  • Local Transcription: Supports transcribing English and Japanese audio and video files into text.
  • Generate SRT Subtitles: Directly produces SRT subtitle files with timestamps from the transcription results.

Steps to Use

Step 1: Download and Launch the Program

Download the package and extract it. Inside the extracted folder, you will find the following file structure.

To run the program, double-click the file named 启动.bat (Launch.bat).

Step 2: Wait for Model Download

On the first run, the program will automatically download the required speech recognition models. A black command-line window will appear, showing the download progress.

The model files are large, so the download requires an internet connection and may take some time, depending on your network speed. Once the download is complete, the program will automatically open the interface in your default browser.

Step 3: Upload Files and Start Transcription

After the program launches successfully, your browser will display the following interface.

Follow these steps:

  1. Select File: Click the dashed box area or drag and drop your audio or video file into it.
  2. Choose Language: From the dropdown menu, select "English" or "Japanese" based on the source file's language.
  3. Start Transcription: Click the "Start Transcription" button.

Once the task is complete, the generated SRT subtitle content will appear in the text box below and be available for download.

API Usage Instructions

For users with development needs, this package provides a local interface compatible with the OpenAI Speech to Text API, allowing programmatic access to the transcription feature.

Python Example:

python
from openai import OpenAI

client = OpenAI(
    base_url="http://127.0.0.1:5092/v1",
    api_key="any-key", # api_key can be any string
)

# Read a local audio file
with open("your_audio.mp3", "rb") as audio_file:
    # Request transcription
    srt_result = client.audio.transcriptions.create(
        model="parakeet", # Model name is fixed as 'parakeet'
        file=audio_file,
		prompt="en", # Specify language: 'en' for English, 'ja' for Japanese
        response_format="srt" # Specify SRT format for the response
    )
print(srt_result)```

### **Summary**

This toolkit provides a localized solution for English and Japanese speech transcription. By following the steps above, users can convert audio and video files into subtitles directly on their own computers.