GPT-SoVITS is an excellent open-source multilingual text-to-speech (TTS) project that supports multiple languages, including Chinese, English, Japanese, and Korean. Its main features include:
Zero-Shot Text-to-Speech (TTS): Quickly generate speech with just a 5-second voice sample.
Few-Shot TTS: Fine-tune the model with only 1 minute of training data to improve timbre similarity and naturalness.
Cross-Lingual Support: Supports synthesis in languages different from the training dataset, currently supporting English, Japanese, Korean, Cantonese, and Chinese.
GPT-SoVITS has been upgraded to version v2, with the following new features:
- Added support for Korean and Cantonese
- Optimized text front-end processing
- Expanded the underlying model training data to 5000 hours
- Can generate higher-quality synthesized audio for low-quality reference audio (such as network audio with high-frequency loss or muffled sound quality)
GPT-SOVITS User Manual https://www.yuque.com/baicaigongchang1145haoyuangong/ib3g1e
The video translation software has integrated GPT-SoVITS v2. This article will briefly introduce how to download the GPT-SoVITS integrated package and use it in the video translation software.
Downloading the Integrated Package
It is recommended to download the official integrated package of GPT-SoVITS to ensure compatibility. Third-party API interfaces are not compatible with the official ones and may cause errors in the video translation software.
Download address: https://www.yuque.com/baicaigongchang1145haoyuangong/ib3g1e/dkxgpiy9zb96hob4
Starting the API Service
In the address bar within the GPT-SoVITS folder, enter cmd
and press Enter. In the pop-up terminal window, enter .\runtime\python api_v2.py
to start the API service.
The default port is 9880
. In the video translation software, you need to fill in http://127.0.0.1:9880
.
You must start the API service to use it in the translation software.
Configuring in the Video Translation Voiceover Software
1. Fill in the API Address
Start the software, click Menu -> TTS Settings -> GPT-SoVITS
in sequence, and fill in http://127.0.0.1:9880
in the API Text Box
.
Note: The default port is 9880. If you modify the port, the API address also needs to be changed accordingly. Also, make sure that when deploying locally, the address should be filled in as
127.0.0.1
, not0.0.0.0
.
2. Fill in the Reference Audio
The reference audio is the audio whose timbre GPT-SoVITS will use for speech synthesis. Suppose you have an audio file 1.wav
(5 seconds long, containing "Today is a good day, it's raining cats and dogs"), you can copy this file to the GPT-SoVITS folder, place it in the same location as the api_v2.py
file, and fill in the corresponding content in the Reference Audio Text Box
of the software.
Language Code:
zh
represents Chinese,en
represents English,ja
represents Japanese, andko
represents Korean.
If you store the reference audio files uniformly in the wavs
folder within the GPT-SoVITS directory, the reference audio path should be wavs/1.wav#Today is a good day, it's raining cats and dogs#zh
.
3. Check api_v2?
If the api_v2.py
file is started, make sure the api_v2?
option is selected.
4. Test Connection
Click test, and if there are no errors, the configuration is successful.
Common Issues
A 404 error is displayed during testing
This is due to the use of a third-party integrated package. The API of the third-party package is not compatible with the official one. Please download and use the official package.
"The remote computer actively refused" or "Please check whether the API service is started" is displayed
The API service may not be started or may be blocked by the firewall. Please ensure that the API is started or turn off the