How to Use the GPT-SoVITS API

First, upgrade your video translation and dubbing tool to the latest version. Then, open the Settings menu and find the GPT-SoVITS API section.

Fill in the corresponding information in the following text boxes:

GPT-SoVITS API: Enter the address and port of the GPT-SoVITS API. The default address for the included api.py is http://127.0.0.1:9880. If you are not deploying locally, modify the IP accordingly and allow access from other machines. If you have changed the interface, update it here as well.

Extra Parameters: Currently not used. This is mainly for redundancy to provide additional information if needed, such as the name of the calling software. The default value is pyvideotrans.

Reference Audio#Audio Text Content#Language Code: This is the most crucial parameter, determining the synthesized voice timbre.

api_v2?: If you plan to use the api_v2 interface, you must check this box.

If you have already specified the default "Reference Audio, Audio Text Content, Language Code" when starting api.py for GPT-SoVITS, you don't need to specify them here. For example, if you executed a command like this:

python api.py -dr 1.wav -dt "Hello, my dear friends, I hope you have a wonderful and joyful day every day." -dl zh

Then you don't need to specify it here, as it will directly use the voice timbre from the 1.wav audio.

If you haven't specified it, or if you're using api_v2.py, you must specify the reference audio.

Now, let's focus on how to fill in the reference audio information.

Reference Audio Filling Format

Each line is divided into three parts using the "#" symbol: Reference Audio Path#Reference Audio Text Content#Language Code

The first part is the path to the reference audio relative to the GPT-SoVITS directory. If you directly placed the reference audio 1.wav in the root directory of the GPT-SoVITS software (i.e., in the same directory as api.py), then enter 1.wav as the reference audio. If you placed it in the audio directory under the root directory, then enter audio/1.wav.

Note: The reference audio should be placed in the GPT-SoVITS software directory, not in the video translation software.

The second part is the text content of the audio, i.e., what the person is saying. Enter the text in the second part.

The third part is the language code, i.e., the language spoken by the speaker. Currently, only Chinese, English, and Japanese are supported. The code can only be one of the following: zh|en|ja.

For example, if the content of my audio 1.wav is "Hello, my dear friends, I hope you have a wonderful and joyful day every day," then the entry would be:

1.wav#Hello, my dear friends, I hope you have a wonderful and joyful day every day.#zh

You can enter multiple lines, one per line, as in the example below:

5.wav#Why does brother Yu Di willingly guard the lonely lamp#zh

d.wav#My university was actually in Xi'an, which is the Foreign Languages University. There were 32 people in our class at that time, only 2 boys#zh

mayun.wav#I remember when I was a freshman in college, I self-taught English since I was a child. I learned English by grabbing foreigners by the West Lake#zh

The overall effect after filling it in is shown in the figure:

After filling in the information, you can test whether it works. If everything is okay, go to the main interface, select "GPT-SoVITS" in the TTS type, and select the audio you can enter in the role list.

Of course, this assumes that you have correctly started the GPT-SoVITS API service.

Starting the GPT-SoVITS API Service

Starting api.py

If you are using the pre-packaged Windows version, enter the GPT-SoVITS root directory, type cmd in the address bar, and press Enter. Then, execute the command .\runtime\python api.py in the pop-up window and wait for the success message.

Starting api_v2.py

python api_v2.py -a 127.0.0.1 -p 9880 -c GPT_SoVITS/configs/tts_infer.yaml

When using api_v2.py, you must check the api_v2 checkbox at the bottom.

For more usage questions, please refer to the GPT-SoVITS documentation

https://www.yuque.com/baicaigongchang1145haoyuangong/ib3g1e