This is a tool for transcribing audio and video to SRT subtitles using Gemini AI.
Pre-packaged Download Links
The pre-packaged version is only available for Windows 10/11, MacOS, and Linux systems. Please use source code deployment for other systems.
Baidu Netdisk: https://pan.baidu.com/s/10gJVMa5L3wnzlf1tFd9euw?pwd=dtpt
Audio and video content has become an important carrier for us to acquire knowledge and share opinions. Efficiently converting audio and video content into text, especially into subtitles with precise timelines, is often achieved using OpenAI's open-source Whisper.
The emergence of Gemini AI brings us a new solution. With its powerful natural language processing capabilities, it can quickly and accurately transcribe audio and video content into text. Furthermore, Gemini AI provides a considerable daily free quota, which is sufficient to meet daily audio and video transcription needs.
However, directly sending complete audio and video files to Gemini AI, while quickly obtaining SRT format subtitles, often results in inaccurate timelines. This is mainly because Gemini AI may experience timeline drift when processing long audio files.
To solve this problem, a simple and easy-to-use tool has been developed that automatically completes the following steps:
- Intelligent Slicing: Uses a VAD (Voice Activity Detection) model to intelligently slice audio and video files into small segments.
- Segment-by-Segment Transcription: Sends each segment to Gemini AI for individual transcription.
- Precise Assembly: Reassembles the transcription results in chronological order into a complete SRT subtitle file, ensuring timeline accuracy.
No complex settings are required, just simple operations to obtain accurate SRT subtitles!
Advantages of Gemini AI:
- High Accuracy: Gemini AI is based on a powerful AI model and has extremely high speech recognition accuracy, enabling it to accurately capture the content in audio and video.
- Fast Speed: Thanks to Gemini AI's powerful computing capabilities, the transcription speed is very fast, greatly saving your time.
- Free Quota: Gemini AI provides a sufficient daily free quota, which is enough to meet daily audio and video transcription needs, reducing usage costs.
- Supports Multiple Formats: This tool supports common audio and video formats, eliminating the need for extra format conversions.
- Accurate Timeline: Through intelligent slicing and segment-by-segment transcription, the generated SRT subtitles have an accurate and error-free timeline.
How to Use
- Get a Gemini API Key: First, you need a Gemini API Key. If you don't have one yet, please follow the instructions at the end of this article to get one.
- Enter API Key: Paste your Gemini API Key into the
GeminiAI Key
input box of the tool. - Select Model: It is recommended to select the
gemini-2.0-flash-exp
model, which has better performance and a sufficient daily free quota. - Set Proxy (Optional): If you are using the tool in an environment without a VPN, please enter the HTTP proxy address and port.
- Select File: Click on the large area at the top to select the audio or video file you want to transcribe.
- Start Transcribing: Click the "Start" button, and the tool will automatically complete the process of slicing, transcribing, and assembling subtitles.
- View Results: After the transcription is complete, click "Open Result Folder" to find the generated SRT subtitle file.
How to Get a Gemini API Key
- Preparation: Ensure you have a VPN connection.
- Visit Google AI Studio: Open the URL https://aistudio.google.com/apikey.
- Register/Login: If you don't have a Google account, please register one first.
- Create API Key: Click the "Create API key" button.
- Copy API Key: Copy the automatically generated API Key.