Using OpenAI's New gpt-4o-mini-tts Text-to-Speech Model | pyVideoTrans-Open Source Video Translation Tool -pyvideotrans.com github.com/jianchang512/pyvideotrans

How to use OpenAI's new speech recognition and text-to-speech models in video translation.

This audio is dubbed using OpenAI's new speech model.

New Speech Transcription Model

OpenAI has just launched a new speech transcription model that is more accurate than the previous whisper-1. It comes in two versions: the cheaper gpt-4o-mini-transcribe model and the more expensive gpt-4o-transcribe model. If you need high-quality recognition or have significant background noise in your audio or video, consider trying the latter.

It's simple to use. If you're using the official OpenAI API, simply enter these two model names in Menu -- Speech Recognition Settings -- OpenAI Speech Recognition and Compatible API -- Fill in all models. Then, select the model you want to use, save, and choose it in the speech recognition channel.

Enter the two models, select the one you want to use, and save.

Back to the main interface, select OpenAI for the speech recognition channel.

New Text-to-Speech (Voice Dubbing) Model

The new text-to-speech model gpt-4o-mini-tts is significantly better than the previous tts-1. It also supports input prompts to set the speaker's speaking style, such as Please speak in an excited tone or Please imitate the emphasis of a news anchor.

You can experience it on OpenAI's free trial website:

https://www.openai.fm/

The usage is also very simple. In the software, go to Menu -- TTS Settings -- OpenAI TTS -- Fill in all models and enter gpt-4o-mini-tts, then select the model you want to use.

Fill in, select the model you want to use.

After saving, you can use it in the main interface.

Select OpenAI TTS in the dubbing channel of the main interface.

Why isn't there a place to enter prompts? Because this is a new model, and the translation software has not been updated yet.

What if I use a third-party relay model?

The usage is the same, just change the API address to the API provided by your third-party relay. However, note that as of now, most third-party relays do not yet support the new models.

New Speech Transcription Model ​

New Text-to-Speech (Voice Dubbing) Model ​

What if I use a third-party relay model? ​

New Speech Transcription Model

New Text-to-Speech (Voice Dubbing) Model

What if I use a third-party relay model?