Skip to content

Gemini AI is not only an excellent large language model for chatting, but also a great tool for speech recognition and converting audio and video to text. It provides over 1500 free requests per day, which is generally sufficient for daily use.

How to Enable Gemini AI Service

First, you need to visit the Gemini AI online Studio page: https://aistudio.google.com/. You might want to try and see if you can open it.

  1. Scientific Internet Access is a Prerequisite: This may be the only hurdle to using Gemini AI. Sometimes, even if you are using scientific internet access, opening the above address may still display a "Country or region not supported" message.

At this point, you need to try switching VPN nodes until the page displays the following interface correctly:

image.png

  1. Get API Key: In the upper left corner of the page shown above, you will see a Get API Key button. Click it and create a new key.

    image.png

  2. Paste API Key: Paste the API Key you obtained into the pyVideoTrans software. To do this, open the software's settings menu, find the "Gemini Pro Gemini Key" option, and paste the key in.

    image.png

Using in Video Translation and Dubbing Software

Please upgrade to the v3.07 patch version first

  1. First, fill in your Key and the model you are using in the menu bar--Translation Settings--Gemini pro, and you can also modify the prompt words during transcription here.

image.png

  1. Don't forget the proxy/vpn, otherwise errors are inevitable

image.png

  1. Select Gemini Large Model Recognition in the voice recognition channel, upload the audio and video, select the pronunciation language, and do not select Chinese re-segmentation. Gemini's own segmentation effect is good, and the result may be worse if you select it.

image.png

  1. Just wait for the recognition result. If you are not satisfied, you can adjust the prompt words and modify again.

image.png