How to use OpenAI's new speech recognition and text-to-speech models in video translation.
This audio is dubbed using OpenAI's new speech model.
New Speech Transcription Model
OpenAI has just launched a new speech transcription model that is more accurate than the previous whisper-1. It comes in two versions: the cheaper gpt-4o-mini-transcribe
model and the more expensive gpt-4o-transcribe
model. If you need high-quality recognition or have significant background noise in your audio or video, consider trying the latter.
It's simple to use. If you're using the official OpenAI API, simply enter these two model names in Menu -- Speech Recognition Settings -- OpenAI Speech Recognition and Compatible API -- Fill in all models
. Then, select the model you want to use, save, and choose it in the speech recognition channel.
New Text-to-Speech (Voice Dubbing) Model
The new text-to-speech model gpt-4o-mini-tts
is significantly better than the previous tts-1
. It also supports input prompts to set the speaker's speaking style, such as Please speak in an excited tone
or Please imitate the emphasis of a news anchor
.
You can experience it on OpenAI's free trial website:
The usage is also very simple. In the software, go to Menu -- TTS Settings -- OpenAI TTS -- Fill in all models
and enter gpt-4o-mini-tts
, then select the model you want to use.
After saving, you can use it in the main interface.
Why isn't there a place to enter prompts? Because this is a new model, and the translation software has not been updated yet.
What if I use a third-party relay model?
The usage is the same, just change the API address to the API provided by your third-party relay. However, note that as of now, most third-party relays do not yet support the new models.