Gemini AI is not only an excellent large language model for chatting, but also a great tool for speech recognition and converting audio and video to text. It provides over 1500 free requests per day, which is generally sufficient for daily use.
How to Enable Gemini AI Service
First, you need to visit the Gemini AI online Studio page: https://aistudio.google.com/. You might want to try and see if you can open it.
- Scientific Internet Access is a Prerequisite: This may be the only hurdle to using Gemini AI. Sometimes, even if you are using scientific internet access, opening the above address may still display a "Country or region not supported" message.
At this point, you need to try switching VPN nodes until the page displays the following interface correctly:
Get API Key: In the upper left corner of the page shown above, you will see a Get API Key button. Click it and create a new key.
Paste API Key: Paste the API Key you obtained into the pyVideoTrans software. To do this, open the software's settings menu, find the "Gemini Pro Gemini Key" option, and paste the key in.
Using in Video Translation and Dubbing Software
Please upgrade to the v3.07 patch version first
- First, fill in your Key and the model you are using in the menu bar--Translation Settings--Gemini pro, and you can also modify the prompt words during transcription here.
- Don't forget the proxy/vpn, otherwise errors are inevitable
- Select
Gemini Large Model Recognition
in the voice recognition channel, upload the audio and video, select the pronunciation language, and do not selectChinese re-segmentation
. Gemini's own segmentation effect is good, and the result may be worse if you select it.
- Just wait for the recognition result. If you are not satisfied, you can adjust the prompt words and modify again.