Skip to content

GPT-SoVITS API Usage Guide

First, update your video translation and dubbing tool to the latest version, then open the Settings menu and find GPT-SoVITS API.

Fill in the corresponding information in the following text boxes:

GPT-SoVITS API: Enter the API address and port of GPT-SoVITS here. The default address of the built-in api.py is http://127.0.0.1:9880. If you are not deploying on the local machine, you need to modify the IP accordingly and allow access from other machines. If you change the interface, you also need to modify it here.

Extra Parameters: Currently not used, mainly for redundancy. It's added for users who want to get more information, such as what software is calling the API. The default value is pyvideotrans.

Reference Audio#Audio Text Content#Language Code: This is the most important parameter, which determines the synthesized timbre.

api_v2?: If you plan to use the api_v2 interface, you must select this option.

If you have specified the default "Reference Audio, Audio Text Content, Language Code" when starting api.py of GPT-SoVITS, you don't need to specify it here. For example, if you execute a command like the following:

python api.py -dr 1.wav -dt "Hello, my dear friends, I hope you have a wonderful and pleasant day" -dl zh

Then you don't need to specify it here, it will directly use the timbre of the audio 1.wav to replicate.

If you have not specified it, or are using api_v2.py, you must specify the reference audio.

Next, we will focus on how to fill in the reference audio.

Reference Audio Filling Format

Each line is divided into three parts by the symbol "#": Reference Audio Path#Reference Audio Text Content#Language Code

The first part is the path of the reference audio relative to GPT-SoVITS. If you directly place the reference audio 1.wav in the root directory of the GPT-SoVITS software, i.e., in the same directory as api.py, then fill in 1.wav for the reference audio. If you put it in the audio directory under the root directory, then fill in audio/1.wav.

Note: The reference audio is placed in the GPT-SoVITS software directory, not in the video translation software.

The second part is the text content of the audio, i.e., what the person inside is saying. Fill in the text in the second part.

The third part is the language code, i.e., what language this speaker speaks. Currently, only Chinese, English and Japanese are supported. The code can only be filled in with one of zh|en|ja.

For example, if the content of my audio 1.wav is "Hello, my dear friends, I hope you have a wonderful and pleasant day", then the effect after I fill it in is:

1.wav#Hello, my dear friends, I hope you have a wonderful and pleasant day#zh

You can fill in multiple lines, one per line, as in the example below:

5.wav#Why does Brother Yu Di willingly guard the lonely lamp#zh

d.wav#My university was actually in Xi'an, which is the Foreign Language University. There were 32 people in our class at that time, only 2 boys#zh

mayun.wav#I remember when I was a freshman, I taught myself English since I was a child. I learned English by grabbing foreigners by the West Lake#zh

The overall effect after filling in is shown in the figure:

After filling in, you can test whether it is OK. After that, go to the main interface, select "GPT-SoVITS" in the TTS type, and select the audio that can be filled in in the role list.

Of course, the premise is that the GPT-SoVITS API service must be started correctly.

Starting the GPT-SoVITS API Service

Starting api.py

If you are using the Windows pre-packaged version, enter the GPT-SoVITS root directory, enter cmd in the address bar and press Enter, then execute the command .\runtime\python api.py in the pop-up window and wait for the prompt to succeed.

Starting api_v2.py

python api_v2.py -a 127.0.0.1 -p 9880 -c GPT_SoVITS/configs/tts_infer.yaml

To use api_v2.py, you must select the api_v2 checkbox at the bottom.

For more usage questions, please see the GPT-SoVITS documentation

https://www.yuque.com/baicaigongchang1145haoyuangong/ib3g1e