How to Use OpenAI's Newly Launched Speech Recognition and Speech Synthesis Models in Video Translation
This audio is dubbed using OpenAI's new speech synthesis model.
New Speech Transcription Model
OpenAI has just launched a new speech transcription model that is more accurate than the previous whisper-1. It comes in two versions: the cheaper gpt-4o-mini-transcribe
and the more expensive gpt-4o-transcribe
. If you need high-quality recognition or have significant audio/video background noise, consider trying the latter.
It's simple to use. If you are using the official OpenAI API, just enter these two model names in Menu -- Speech Recognition Settings -- OpenAI Speech Recognition and Compatible API -- Fill in all models
. Then select the model you want to use, save, and return to the speech recognition channel selection.
New Speech Synthesis (Text-to-Speech) Model
The new speech synthesis model gpt-4o-mini-tts
performs much better than the previous tts-1
. It also supports input prompts to set the speaker's speaking style, such as Please speak in an excited tone
, Please imitate a news anchor's emphasis
, etc.
You can experience it on OpenAI's free trial website:
It's also very simple to use. In the software's Menu -- TTS Settings -- OpenAI TTS -- Fill in all models
, enter gpt-4o-mini-tts
and select the model you want to use.
After saving, you can use it in the main interface.
Why is there no place to enter prompts? Because this is a brand new model, and the translation software has not been updated yet.
What if I'm using a third-party relay model?
The usage method is the same, just change the API address to the API provided by your third-party relay. However, note that as of now, most third-party relays do not yet support the new models.