GPT-SoVITS API Usage Guide
First, update the video translation and dubbing tool to the latest version, then open the Settings menu - GPT-SoVITS API.
Fill in the corresponding content in the following text boxes:
GPT-SoVITS API: Enter the GPT-SoVITS API address and port here. The default address of the built-in api.py is http://127.0.0.1:9880. If you are not deploying locally, the IP should be modified accordingly and allow other machines to access it. If you have changed the interface, modify it here accordingly.
Extra Parameters: Not currently used, mainly for redundancy, to facilitate some users who want to obtain more information, such as which software is calling. So, a redundancy is added here, with a default value of pyvideotrans.
Reference Audio#Audio Text Content#Language Code: This is the most important parameter, which determines the tone to be synthesized.
api_v2?: If you plan to use the api_v2 interface, you must select this option.
If you have specified the default "Reference Audio, Audio Text Content, Language Code" when starting GPT-SoVITS's api.py, you don't need to specify it here. For example, if you execute something like the following specification:
python api.py -dr 1.wav -dt "Hello, my dear friends, I hope you have a wonderful and happy day every day" -dl zh
Then you don't need to specify it here, it will directly use the tone of the audio 1.wav to copy.
If you have not specified it, or are using api_v2.py
, you must specify the reference audio.
Next, we will focus on how to fill in the reference audio.
Reference Audio Filling Format
Each line is divided into 3 parts by the "#" symbol: Reference Audio Path#Text Content of Reference Audio#Language Code
The first part is the path of the reference audio relative to GPT-SoVITS. If you directly put the reference audio 1.wav in the root directory of the GPT-SoVITS software, i.e., in the same directory as api.py
, then fill in 1.wav
for the reference audio. If you put it in the audio directory under the root directory, then fill in audio/1.wav
.
Note: The reference audio is placed in the GPT-SoVITS software directory, not in the video translation software.
The second part is the text content in the audio, i.e., what the person inside is saying. Fill in the text in the second part.
The third part is the language code, i.e., what language the speaker is speaking. Currently, only Chinese, English, and Japanese are supported. The code can only be filled with one of the three: zh|en|ja
.
For example, if the content in my audio 1.wav is "Hello, my dear friends, I hope you have a wonderful and happy day every day", then the effect after I fill it in is:
1.wav#Hello, my dear friends, I hope you have a wonderful and happy day every day#zh
You can fill in multiple lines, one per line, as in the example below:
5.wav#Why does Brother Yu Di willingly guard the lonely lamp#zh
d.wav#My university was actually in Xi'an, which is the Foreign Languages University. There were 32 people in our class at that time, only 2 boys#zh
mayun.wav#I remember when I was a freshman in college, I, I learned English by myself since I was a child. My English was learned by grabbing foreigners by the West Lake#zh
The overall effect after filling in is shown in the figure:
After filling in, you can test whether it is possible. After confirming that there are no problems, go to the main interface and select "GPT-SoVITS" in the TTS type, and select the audio that can be filled in in the role list.
Of course, the prerequisite is that the GPT-SoVITS API service must be started correctly.
Starting the GPT-SoVITS API Service
Start api.py
If you are using the Windows pre-packaged version, after entering the GPT-SoVITS root directory, enter cmd
in the address bar and press Enter, then execute the command .\runtime\python api.py
in the pop-up window and wait for the prompt to succeed.
Start api_v2.py
python api_v2.py -a 127.0.0.1 -p 9880 -c GPT_SoVITS/configs/tts_infer.yaml
To use api_v2.py, you must select the
api_v2
checkbox at the bottom.