Use GPT-SoVITS for Voiceovers | pyVideoTrans-Open Source Video Translation Tool -pyvideotrans.com github.com/jianchang512/pyvideotrans

GPT-SoVITS is an outstanding open-source multilingual Text-to-Speech (TTS) project. It supports multiple languages, including Chinese, English, Japanese, and Korean. Key features include:

Zero-Shot Text-to-Speech (TTS): Generates speech quickly with just a 5-second voice sample.

Few-Shot TTS: Fine-tunes the model with only 1 minute of training data to improve timbre similarity and naturalness.

Cross-Lingual Support: Supports synthesis in languages different from the training dataset, currently supporting English, Japanese, Korean, Cantonese and Chinese.

GPT-SoVITS has been upgraded to version 2, with the following new features:

Added support for Korean and Cantonese
Optimized text front-end processing
Expanded the underlying model training data to 5000 hours
Generates higher-quality synthesized audio for low-quality reference audio (such as network audio with high-frequency loss or muffled sound quality)

GPT-SOVITS User Manual: https://www.yuque.com/baicaigongchang1145haoyuangong/ib3g1e

The video translation software has integrated GPT-SoVITS v2. This article will briefly introduce how to download the GPT-SoVITS integration package and use it in the video translation software.

Downloading the Integration Package

It is recommended to download the official GPT-SoVITS integration package to ensure compatibility. Third-party API interfaces are incompatible with the official ones and may cause errors in the video translation software.

Download address: https://www.yuque.com/baicaigongchang1145haoyuangong/ib3g1e/dkxgpiy9zb96hob4

Starting the API Service

In the address bar within the GPT-SoVITS folder, type cmd and press Enter. In the pop-up terminal window, enter .\runtime\python api_v2.py to start the API service.

The default port is 9880. In the video translation software, you need to fill in http://127.0.0.1:9880.

You must start the API service to use it in the translation software.

Configuration in the Video Translation Dubbing Software

1. Fill in the API Address

Start the software, click Menu -> TTS Settings -> GPT-SoVITS in sequence, and fill in http://127.0.0.1:9880 in the API Text Box.

Note: The default port is 9880. If you modify the port, the API address also needs to be changed accordingly. In addition, when deploying locally, ensure that the address should be filled in as 127.0.0.1, not 0.0.0.0.

2. Fill in the Reference Audio

The reference audio refers to the audio whose timbre GPT-SoVITS will use for speech synthesis. Suppose you have an audio file 1.wav (5 seconds long, containing "Today is a good day, it's pouring rain"), you can copy this file to the GPT-SoVITS folder, place it in the same location as the api_v2.py file, and fill in the corresponding content in the Reference Audio Text Box of the software.

Language code: zh represents Chinese, en represents English, ja represents Japanese, and ko represents Korean.

If you store the reference audio files uniformly in the wavs folder within the GPT-SoVITS directory, the reference audio path should be wavs/1.wav#Today is a good day, it's pouring rain#zh.

3. Check `api_v2?`

If you are starting the api_v2.py file, make sure to select the api_v2? option.

4. Test Connection

Click test, if there are no errors, the configuration is successful.

Common Issues

Prompted with a 404 error during testing
This is because a third-party integration package is being used, and the API of the third-party package is incompatible with the official one. Please download and use the official package.
Prompts "Remote computer actively refused" or "Please check if the API service is started"
It is possible that the API service has not been started or is blocked by the firewall. Please ensure that the API is started, or turn off

Downloading the Integration Package ​

Starting the API Service ​

Configuration in the Video Translation Dubbing Software ​

1. Fill in the API Address ​

2. Fill in the Reference Audio ​

3. Check api_v2? ​

4. Test Connection ​

Common Issues ​