Skip to content

This is a tool for transcribing audio and video into SRT subtitles using Gemini AI.

Pre-packaged Version Download

The pre-packaged version is only applicable to Win10/11, Macos, and Linux systems. Please use source code deployment

Baidu Netdisk Link: https://pan.baidu.com/s/10gJVMa5L3wnzlf1tFd9euw?pwd=dtpt

Audio and video content has become an important carrier for us to acquire knowledge and share opinions. Efficiently converting audio and video content into text, especially into subtitles with precise timelines, is usually done using OpenAI's open-source Whisper.

The emergence of Gemini AI brings us a new solution. With its powerful natural language processing capabilities, it can quickly and accurately transcribe audio and video content into text. Moreover, Gemini AI provides a considerable daily free quota, which is sufficient to meet daily audio and video transcription needs.

However, directly sending complete audio and video files to Gemini AI can quickly obtain SRT format subtitles, but the timeline is often not accurate enough. This is mainly because Gemini AI may have timeline offsets when processing long audio.

To solve this problem, a simple and easy-to-use tool has been developed that automatically completes the following steps:

  1. Intelligent Segmentation: Uses a VAD (Voice Activity Detection) model to intelligently segment audio and video files into small clips.
  2. Piece-by-Piece Transcription: Sends each clip individually to Gemini AI for transcription.
  3. Precise Assembly: Reassembles the transcribed results in chronological order into a complete SRT subtitle file, ensuring the accuracy of the timeline.

No complicated settings are required, just simple operations to obtain SRT subtitles with accurate timelines!

image.png

Advantages of Gemini AI:

  • High Accuracy: Gemini AI is based on a powerful AI model and has extremely high speech recognition accuracy, capable of accurately capturing the content in audio and video.
  • Fast Speed: Thanks to the powerful computing capabilities of Gemini AI, the transcription speed is very fast, saving you a lot of time.
  • Free Quota: Gemini AI provides a sufficient daily free quota, which is sufficient to meet daily audio and video transcription needs and reduce usage costs.
  • Supports Multiple Formats: This tool supports common audio and video formats, eliminating the need for additional format conversions.
  • Precise Timeline: Through intelligent segmentation and piece-by-piece transcription, the generated SRT subtitles have a precise and accurate timeline.

How to Use

  1. Get a Gemini API Key: First, you need to have a Gemini API Key. If you don't have one yet, please follow the instructions at the end of the article to obtain it.
  2. Fill in the API Key: Paste your Gemini API Key into the tool's GeminiAI Key input box.
  3. Select a Model: It is recommended to select the gemini-2.0-flash-exp model, which has better performance and a sufficient daily free quota.
  4. Set Proxy (Optional): If you are using it in an environment without a VPN, please fill in the HTTP proxy address and port.
  5. Select File: Click the large area above to select the audio or video file you want to transcribe.
  6. Start Transcription: Click the "Start" button, and the tool will automatically complete the process of segmentation, transcription, and subtitle assembly.
  7. View Results: After the transcription is complete, click "Open Results Folder" to find the generated SRT subtitle file.

gemini.gif

How to Obtain a Gemini API Key

  1. Preparation: Make sure you have a VPN.
  2. Visit Google AI Studio: Open the website https://aistudio.google.com/apikey.
  3. Register/Login: If you don't have a Google account, please register one first.
  4. Create API Key: Click the "Create API Key" button.
  5. Copy API Key: Copy the automatically generated API Key.

image.png