This is a tool that uses Gemini AI to transcribe audio and video into SRT subtitles.
Pre-packaged Version Download
The pre-packaged version is only available for Win10/11, MacOS and Linux systems. Please use source code deployment.
Baidu Netdisk: https://pan.baidu.com/s/10gJVMa5L3wnzlf1tFd9euw?pwd=dtpt
Audio and video content has become an important medium for us to acquire knowledge and share opinions. Efficiently converting audio and video content into text, especially into subtitles with precise timelines, is usually done using OpenAI's open-source Whisper.
The emergence of Gemini AI brings us a new solution. With its powerful natural language processing capabilities, it can quickly and accurately transcribe audio and video content into text. Moreover, Gemini AI provides a considerable daily free quota, which is sufficient to meet daily audio and video transcription needs.
However, directly sending complete audio and video files to Gemini AI, although it can quickly obtain SRT format subtitles, the timeline is often not accurate enough. This is mainly because Gemini AI may experience timeline drift when processing long audio.
To solve this problem, a simple and easy-to-use tool has been developed that automatically completes the following steps:
- Intelligent Segmentation: Using the VAD (Voice Activity Detection) model, the audio and video file is intelligently segmented into small clips.
- Piece-by-piece Transcription: Each clip is sent to Gemini AI separately for transcription.
- Precise Assembly: The transcription results are reassembled into a complete SRT subtitle file in chronological order, ensuring the accuracy of the timeline.
No complex settings are required, just simple operations to get SRT subtitles with accurate timelines!
Advantages of Gemini AI:
- High Accuracy: Based on a powerful AI model, Gemini AI has extremely high speech recognition accuracy, accurately capturing the content in audio and video.
- Fast Speed: Thanks to Gemini AI's powerful computing capabilities, the transcription speed is very fast, saving you a lot of time.
- Free Quota: Gemini AI provides a sufficient daily free quota, which is sufficient to meet daily audio and video transcription needs, reducing usage costs.
- Supports Multiple Formats: This tool supports common audio and video formats, eliminating the need for additional format conversions.
- Precise Timeline: Through intelligent segmentation and piece-by-piece transcription, the generated SRT subtitle timeline is accurate and error-free.
How to use:
- Get a Gemini API Key: First, you need to have a Gemini API Key. If you don't have one yet, please follow the instructions at the end of the article to get it.
- Fill in the API Key: Paste your Gemini API Key into the
GeminiAI Key
input box of the tool. - Select a Model: It is recommended to select the
gemini-2.0-flash-exp
model, which has better results and a sufficient daily free quota. - Set up a Proxy (Optional): If you are using it in an environment without a VPN, please fill in the HTTP proxy address and port.
- Select a File: Click on the large area above to select the audio or video file you want to transcribe.
- Start Transcribing: Click the "Start" button, and the tool will automatically complete the process of slicing, transcribing, and assembling subtitles.
- View Results: After the transcription is complete, click "Open Result Folder" to find the generated SRT subtitle file.
How to get a Gemini API Key
- Preparation: Make sure you have a VPN.
- Visit Google AI Studio: Open the website https://aistudio.google.com/apikey.
- Register/Login: If you don't have a Google account, please register first.
- Create an API Key: Click the "Create Key" button.
- Copy the API Key: Copy the automatically generated API Key.