Skip to content

Integrating Minimaxi Text-to-Speech (TTS) into a Custom TTS API Interface

Since edge-tts has become less reliable, voiceovers have become more troublesome. Free options are limited to local deployments of GPT-SoVITS/CosyVoice/F5-TTS/Kokoro/ChatTTS, etc.

Online services like OpenAI TTS have a noticeable lisp when synthesizing Chinese. Currently, the best online Chinese options are AzureTTS/ByteDance TTS/302.AI.

In the v3.62 patch, support for Minimaxi (the parent company of Hailuo AI) text-to-speech has been built into the Custom TTS API Interface. It supports dozens of characters and 15 languages, and allows you to set emotions, tones, etc., making it a fairly usable option.

Introduction to Integration Methods

There are two integration methods: 1. Integrate via 302.AI, which is simpler, ready to use upon registration, requires no real-name authentication, and has fewer restrictions. Recommended. 2. Integrate natively from Minimaxi.com, which is slightly more complex, has a lower request frequency limit (3 times per minute), and requires real-name authentication with a bank card and reserved mobile phone number.

1. Integration via 302.AI

This essentially also uses Minimaxi's voiceover, but it's routed through 302.AI, making it slightly more convenient to use. 302.AI registration address (register through this link to receive a $1 credit): https://share.302.ai/pyvideo

  1. First, upgrade pyVideoTrans to v3.62 (upgrade address: https://pvt9.com/downpackage)
  2. Then, find Menu -- TTS Settings -- Custom TTS API. As shown in the figure below, enter https://api.302.ai/minimaxi/v1/t2a_v2 in the API field. Paste the following roles in Voiceover Role Name. The voiceover roles are the same for both integration methods.
青涩青年音色:male-qn-qingse,
精英青年音色:male-qn-jingying,
霸道青年音色:male-qn-badao,
青年大学生音色:male-qn-daxuesheng,
少女音色:female-shaonv,
御姐音色:female-yujie,
成熟女性音色:female-chengshu,
甜美女性音色:female-tianmei,
男性主持人:presenter_male,
女性主持人:presenter_female,
男性有声书1:audiobook_male_1,
男性有声书2:audiobook_male_2,
女性有声书1:audiobook_female_1,
女性有声书2:audiobook_female_2,
青涩青年音色-beta:male-qn-qingse-jingpin,
精英青年音色-beta:male-qn-jingying-jingpin,
霸道青年音色-beta:male-qn-badao-jingpin,
青年大学生音色-beta:male-qn-daxuesheng-jingpin,
少女音色-beta:female-shaonv-jingpin,
御姐音色-beta:female-yujie-jingpin,
成熟女性音色-beta:female-chengshu-jingpin,
甜美女性音色-beta:female-tianmei-jingpin,
聪明男童:clever_boy,
可爱男童:cute_boy,
萌萌女童:lovely_girl,
卡通猪小琪:cartoon_pig,
病娇弟弟:bingjiao_didi,
俊朗男友:junlang_nanyou,
纯真学弟:chunzhen_xuedi,
冷淡学长:lengdan_xiongzhang,
霸道少爷:badao_shaoye,
甜心小玲:tianxin_xiaoling,
俏皮萌妹:qiaopi_mengmei,
妩媚御姐:wumei_yujie,
嗲嗲学妹:diadia_xuemei,
淡雅学姐:danya_xuejie,
Santa Claus:Santa_Claus,
Grinch:Grinch,
Rudolph:Rudolph,
Arnold:Arnold,
Charming Santa:Charming_Santa,
Charming Lady:Charming_Lady,
Sweet Girl:Sweet_Girl,
Cute Elf:Cute_Elf,
Attractive Girl:Attractive_Girl,
Serene Woman:Serene_Woman

Copy the API KEY from the 302.AI backend and paste it into the SK field in the software.

The final configuration should look like the image below. Test it. If the audio plays normally, the configuration is correct, and you can save and use it.

2. Native Minimaxi Integration

Registration and login address: https://platform.minimaxi.com/login After logging in, you need to verify your identity with your bank card number and the mobile phone number reserved with the bank. After the verification is passed, open this address: https://platform.minimaxi.com/user-center/basic-information Copy the groupID

Then open the software Menu -- TTS Settings -- Custom TTS API, and fill in the API address. Note that you must replace it with your groupID: https://api.minimax.chat/v1/t2a_v2?GroupId=your_copied_groupID

Fill in the interface key in SK. You can create it at this address: https://platform.minimaxi.com/user-center/basic-information/interface-key

The filling method for the voiceover role is the same as 302.AI. After filling in everything, it should look like the image below.

Note that if you have not passed real-name authentication, the test may fail. In addition, when using this method, please open Menu -- Tools/Options -- Advanced Options -- Voiceover Adjustment -- Set the number of simultaneous voiceovers to 1 and the pause time after voiceover to a value greater than 25. Otherwise, you may exceed the frequency limit and fail. Ordinary users are only allowed 3 requests per minute, i.e. one request every 20 seconds.

Pronunciation Language Selection

Supports 15 languages: Chinese, Cantonese, English, Spanish, French, Russian, German, Portuguese, Arabic, Italian, Japanese, Korean, Indonesian, Vietnamese, Turkish, Dutch, Ukrainian

When dubbing in the software interface, select the language of the subtitles. However, note that it must be within the above 15 languages. Only when you need Cantonese pronunciation, you need to open the Custom TTS API interface and set the language to Chinese,Yue. At other times, please ensure that auto is selected here.

Pronunciation Emotion Selection

Minimaxi supports 7 emotions: Happy, Sad, Angry, Fearful, Disgusted, Surprised, Neutral. However, tests have shown that the differences are not significant. If needed, you can open this interface to set them.

Finally, unless you have opened an enterprise account in Minimaxi and have a high level, it is recommended to use the 302.AI integration method. Otherwise, 3 requests per minute for subtitle voiceover will either be unacceptably slow or frequently report rate limit frequency restriction errors. 302.AI registration address ($1 trial credit): https://share.302.ai/pyvideo