Speech Recognition with Gemini AI | pyVideoTrans-Open Source Video Translation Tool -pyvideotrans.com github.com/jianchang512/pyvideotrans

Gemini AI is not only an excellent large language model for chat, but also a great speech recognition and audio/video-to-text tool. It offers over 1500 free requests per day, which is generally sufficient for daily use.

How to Enable Gemini AI Service

First, you need to visit the Gemini AI online Studio page: https://aistudio.google.com/. Give it a try to see if you can open it.

A VPN is Required: This may be the only barrier to using Gemini AI. Sometimes, even if you are using a VPN, opening the above address may still display a "Country or region not supported" message.

At this point, you need to try switching VPN nodes until the page correctly displays the interface shown below:

Get API Key: In the upper left corner of the page shown above, you will see a Get API Key button. Click it and then create a new key.
Paste API Key: Paste the API Key you obtained into the pyVideoTrans software. Specifically, open the software's settings menu, find the "Gemini Pro Gemini Key" option, and paste the key into it.

Using it in Video Translation and Dubbing Software

Please upgrade to the v3.07 patch version first

First, in the menu bar -- Translation Settings -- Gemini pro, fill in your Key, the model used, and you can also modify the transcription prompt here.

Don't forget the proxy/VPN, otherwise errors will occur.

Select Gemini large model recognition in the speech recognition channel, upload audio and video, select the pronunciation language, and do not select Chinese re-segmentation. Gemini's own segmentation effect is good, and selecting it may result in worse results.

Just wait for the recognition results. If you are not satisfied, you can adjust the prompt and modify it again.

How to Enable Gemini AI Service ​

Using it in Video Translation and Dubbing Software ​

How to Enable Gemini AI Service

Using it in Video Translation and Dubbing Software