Gemini AI is not only an excellent large language model for chatting but also a powerful tool for speech recognition and audio/video transcription. It offers over 1,500 free API calls per day, which generally meets daily usage needs.
How to Activate Gemini AI Service
First, you need to visit the Gemini AI online Studio page: https://aistudio.google.com/. Try to see if you can access it.
- VPN is a Prerequisite: This might be the only barrier to using Gemini AI. Sometimes, even with a VPN enabled, you may still see an "unsupported country or region" message when opening the link.
In this case, try switching VPN nodes until the page displays correctly, as shown below:
Get Your API Key: In the top-left corner of the page shown above, you'll see a Get API Key button. Click it and create a new key.
Paste the API Key: Copy and paste your API key into the pyVideoTrans software. To do this, open the software's settings menu, locate the "Gemini Pro Gemini Key" option, and paste the key there.
Using It in Video Translation and Dubbing Software
First, upgrade to the v3.07 patch version.
- Start by going to Menu Bar > Translation Settings > Gemini Pro. Enter your Key, select the model to use, and optionally modify the transcription prompt here.
- Don't forget to enable your proxy/VPN; otherwise, errors will occur.
- In the speech recognition channel, select
Gemini Large Model Recognition
, upload your audio or video file, choose the spoken language, and do not checkChinese Re-segmentation
. Gemini's built-in segmentation is effective, and enabling this option may worsen the results.
- Wait for the recognition results. If unsatisfied, you can adjust the prompt and try again.