This is a tool that uses Gemini AI to transcribe audio and video into SRT subtitles.
Pre-packaged Version Download Link
The pre-packaged version is only for Windows 10/11. For macOS and Linux systems, please use the source code deployment.
Baidu Netdisk Link: https://pan.baidu.com/s/10gJVMa5L3wnzlf1tFd9euw?pwd=dtpt
Audio and video content have become essential mediums for acquiring knowledge and sharing perspectives. Efficiently converting audio and video content into text, especially into subtitles with precise timestamps, is often achieved using OpenAI's open-source Whisper.
The emergence of Gemini AI offers a new solution. Leveraging its powerful natural language processing capabilities, Gemini AI can quickly and accurately transcribe audio and video content into text. Additionally, Gemini AI provides a generous daily free quota, sufficient for everyday transcription needs.
However, directly sending the entire audio or video file to Gemini AI may yield SRT subtitles quickly, but the timestamps are often inaccurate. This is primarily because Gemini AI may experience timestamp drift when processing long audio files.
To address this issue, a simple and user-friendly tool has been developed that automatically performs the following steps:
- Smart Segmentation: Uses a VAD (Voice Activity Detection) model to intelligently split the audio or video file into small segments.
- Segment-by-Segment Transcription: Sends each segment individually to Gemini AI for transcription.
- Precise Assembly: Reassembles the transcription results in chronological order into a complete SRT subtitle file, ensuring accurate timestamps.
No complex setup required—just a few simple steps to obtain SRT subtitles with precise timestamps!
Advantages of Gemini AI:
- High Accuracy: Gemini AI is based on a powerful AI model, offering exceptional speech recognition accuracy to accurately capture the content of audio and video.
- Fast Speed: Thanks to Gemini AI's robust computational capabilities, transcription is very fast, saving you significant time.
- Free Quota: Gemini AI provides a substantial daily free quota, sufficient for everyday audio and video transcription needs, reducing usage costs.
- Supports Multiple Formats: This tool supports common audio and video formats, eliminating the need for additional format conversion.
- Precise Timestamps: Through smart segmentation and segment-by-segment transcription, the generated SRT subtitles have accurate and reliable timestamps.
How to Use
- Obtain a Gemini API Key: First, you need a Gemini API Key. If you don't have one, follow the instructions at the end of this article to get it.
- Enter the API Key: Paste your Gemini API Key into the
GeminiAI Key
input box in the tool. - Select a Model: It is recommended to choose the
gemini-2.0-flash-exp
model, which performs well and has a sufficient daily free quota. - Set Up a Proxy (Optional): If you are using the tool without a VPN, enter the HTTP proxy address and port.
- Select a File: Click on the large area at the top to select the audio or video file you wish to transcribe.
- Start Transcription: Click the "Start" button, and the tool will automatically handle segmentation, transcription, and subtitle assembly.
- View Results: Once transcription is complete, click "Open Result Folder" to find the generated SRT subtitle file.
How to Get a Gemini API Key
- Preparation: Ensure you have access to a VPN.
- Visit Google AI Studio: Go to https://aistudio.google.com/apikey.
- Register/Log In: If you don't have a Google account, create one first.
- Create an API Key: Click the "Create Key" button.
- Copy the API Key: Copy the automatically generated API Key.