Skip to content

GPT-SoVITS is an excellent open-source multilingual text-to-speech (TTS) project, supporting multiple languages such as Chinese, English, Japanese, and Korean. Its main features include:

Zero-shot Text-to-Speech (TTS): Generate speech quickly with just a 5-second voice sample.

Few-shot TTS: Fine-tune the model with only 1 minute of training data to improve voice similarity and naturalness.

Cross-language Support: Synthesize speech in languages different from the training dataset, currently supporting English, Japanese, Korean, Cantonese, and Chinese.

GPT-SoVITS has been upgraded to v2, with the following new features:

  1. Added support for Korean and Cantonese.
  2. Optimized text front-end processing.
  3. Expanded the underlying model training data to 5,000 hours.
  4. Improved synthesis quality for low-quality reference audio (e.g., internet audio with missing high frequencies or muffled sound).

GPT-SoVITS User Manual: https://www.yuque.com/baicaigongchang1145haoyuangong/ib3g1e

The video translation software has integrated GPT-SoVITS v2. This article briefly explains how to download the GPT-SoVITS integration package and use it in the video translation software.

Download the Integration Package

It is recommended to download the official GPT-SoVITS integration package to ensure compatibility. Third-party API interfaces are not compatible with the official version and may cause errors in the video translation software.

Download link: https://www.yuque.com/baicaigongchang1145haoyuangong/ib3g1e/dkxgpiy9zb96hob4

image.png

Start the API Service

In the address bar of the GPT-SoVITS folder, type cmd and press Enter. In the terminal window that opens, enter .\runtime\python api_v2.py to start the API service.

image.png

The default port is 9880. In the video translation software, enter http://127.0.0.1:9880.

The API service must be started to use it in the translation software.

Configure in the Video Translation and Dubbing Software

1. Enter the API Address

Start the software, click Menu -> TTS Settings -> GPT-SoVITS, and enter http://127.0.0.1:9880 in the API Text Box.

image.png

Note: The default port is 9880. If you change the port, the API address must be updated accordingly. Also, when deploying locally, ensure the address is 127.0.0.1, not 0.0.0.0.

2. Enter the Reference Audio

Note: The reference audio must be in WAV format and between 5-10 seconds in length; otherwise, a 400 Client error will occur.

The reference audio is used by GPT-SoVITS to synthesize speech with the same voice characteristics. For example, if you have an audio file 1.wav (5 seconds long, with the content "Today is a good day, with heavy rain pouring down"), copy this file to the GPT-SoVITS folder, placing it in the same location as the api_v2.py file. Then, enter the corresponding content in the Reference Audio Text Box in the software.

image.png

Language codes: zh for Chinese, en for English, ja for Japanese, ko for Korean.

If you store all reference audio files in the wavs folder within the GPT-SoVITS directory, the reference audio path should be wavs/1.wav#Today is a good day, with heavy rain pouring down#zh.

image.png

3. Check the api_v2? Option

If you started the service using the api_v2.py file, ensure the api_v2? option is selected. image.png

4. Test the Connection

Click the test button. If no error occurs, the configuration is successful.

Common Issues

  1. 404 Error During Testing

    This is caused by using a third-party integration package, as third-party APIs are not compatible with the official version. Please download and use the official package.

  2. "Remote Computer Actively Refused" or "Please Check if the API Service is Started"

    This may be due to the API service not being started or being blocked by a firewall. Ensure the API service is running or disable the firewall.