Skip to content

How to Use OpenAI's Newly Launched Speech Recognition and Speech Synthesis Models in Video Translation

This audio is dubbed using OpenAI's new speech synthesis model.

New Speech Transcription Model

OpenAI has just launched a new speech transcription model that is more accurate than the previous whisper-1. It comes in two versions: the cheaper gpt-4o-mini-transcribe and the more expensive gpt-4o-transcribe. If you need high-quality recognition or have significant audio/video background noise, consider trying the latter.

It's simple to use. If you are using the official OpenAI API, just enter these two model names in Menu -- Speech Recognition Settings -- OpenAI Speech Recognition and Compatible API -- Fill in all models. Then select the model you want to use, save, and return to the speech recognition channel selection.

Fill in these two models, select the one to use, and save

Back to the main interface, select OpenAI for the speech recognition channel

New Speech Synthesis (Text-to-Speech) Model

The new speech synthesis model gpt-4o-mini-tts performs much better than the previous tts-1. It also supports input prompts to set the speaker's speaking style, such as Please speak in an excited tone, Please imitate a news anchor's emphasis, etc.

You can experience it on OpenAI's free trial website:

https://www.openai.fm/

It's also very simple to use. In the software's Menu -- TTS Settings -- OpenAI TTS -- Fill in all models, enter gpt-4o-mini-tts and select the model you want to use.

Fill in and select the model to use

After saving, you can use it in the main interface.

Select OpenAI TTS in the main interface dubbing channel

Why is there no place to enter prompts? Because this is a brand new model, and the translation software has not been updated yet.

What if I'm using a third-party relay model?

The usage method is the same, just change the API address to the API provided by your third-party relay. However, note that as of now, most third-party relays do not yet support the new models.