clone-voice: Voice Cloning Tool

The model used in this project is xtts_v2 from coqui.ai, and the model's open-source license is Coqui Public Model License 1.0.0. Please comply with this license when using this project. The full text of the agreement can be found at https://coqui.ai/cpml.txt

This is a voice cloning tool that can use any human voice to synthesize text into speech using that voice, or convert one voice into another using the cloned voice.

It's very simple to use, even without an NVIDIA GPU. Download the pre-compiled version, double-click app.exe to open a web interface, and start cloning with a few clicks.

Supports 16 languages including Chinese, English, Japanese, Korean, French, German, and Italian. You can also record your voice online from the microphone.

For the best synthesis results, it's recommended to record for 5 to 20 seconds, with clear and accurate pronunciation and no background noise.

English results are excellent, and Chinese results are decent.

Using the Windows Pre-compiled Version (Source Code Deployment for Other Systems)

Click here to open the Releases download page and download the pre-compiled main file (1.7GB) and the model (3GB).
After downloading, extract the files to a location, such as E:/clone-voice.
Double-click app.exe and wait for the web window to open automatically. Please read the text prompts in the CMD window carefully. Any errors will be displayed there.
After downloading the model, extract it into the tts folder in the software directory. The extracted result should look like this:

Conversion steps:
- Select the Text -> Voice button, enter text in the text box, or click to import an SRT subtitle file, and then click "Start Now".
- Select the Voice -> Voice button, click or drag the audio file you want to convert (MP3/WAV/FLAC), and then select the voice you want to clone from the "Voice File to Use" drop-down box. If you are not satisfied, you can also click the "Upload Local" button to select a recorded WAV/MP3/FLAC audio file (5-20 seconds). Or, click the "Start Recording" button to record your own voice online for 5-20 seconds, and click "Use" when finished. Then click the "Start Now" button.
If your machine has an NVIDIA GPU and the CUDA environment is configured correctly, CUDA acceleration will be used automatically.

Source Code Deployment (Linux, macOS, Windows)

For the source code version, you need to set a proxy in .env using HTTP_PROXY= (e.g., http://127.0.0.1:7890). Models are downloaded from https://huggingface.co and https://github.com, which may be inaccessible in some regions. Ensure your proxy is stable and reliable, otherwise, large model downloads may fail.

Requires Python 3.9 -> 3.11, and git-cmd tool installed beforehand. Download address
Create an empty directory, such as E:/clone-voice. Open a CMD window in this directory by typing cmd in the address bar and pressing Enter. Use Git to pull the source code to the current directory: git clone [email protected]:jianchang512/clone-voice.git .
Create a virtual environment: python -m venv venv
Activate the environment: E:/clone-voice/venv/scripts/activate (Windows).
Install dependencies: pip install -r requirements.txt --no-deps. To enable CUDA acceleration on Windows and Linux, uninstall torch by executing pip uninstall -y torch and then install torch and torchaudio with CUDA support: pip install torch torchaudio --index-url https://download.pytorch.org/whl/cu121. (Requires an NVIDIA GPU and a properly configured CUDA environment)
On Windows, extract ffmpeg.7z and place ffmpeg.exe in the same directory as app.py. On Linux and macOS, download the corresponding version of FFmpeg from the FFmpeg official website, extract the ffmpeg program, and place it in the root directory. The executable binary file ffmpeg must be in the same directory as app.py.
First, run python code_dev.py. Enter y when prompted to agree to the agreement, and then wait for the model to download.
Downloading the model requires a global proxy. The model is very large, and if the proxy is not stable enough, you may encounter many errors. Most errors are caused by proxy issues.
If it shows that multiple models have been downloaded successfully, but still prompts "Downloading WavLM model" error, then you need to modify the library package file \venv\Lib\site-packages\aiohttp\client.py, add your proxy address above the line if proxy is not None: around line 535, for example, proxy="http://127.0.0.1:10809".
After the download is complete, start python app.py.
[Training Instructions] If you want to train the model, execute python train.py. Training parameters can be adjusted in param.json. After adjusting, re-execute the training script: python train.py.
Each startup will connect to a server outside the firewall to check for or update the model. Please be patient. If you don't want to check for or update every time you start, you need to manually modify the file under the dependency package. Open \venv\Lib\site-packages\TTS\utils\manage.py, and in the download_model method around line 389, comment out the following code:

if md5sum is not None:
	md5sum_file = os.path.join(output_path, "hash.md5")
	if os.path.isfile(md5sum_file):
	    with open(md5sum_file, mode="r") as f:
		if not f.read() == md5sum:
		    print(f" > {model_name} has been updated, clearing model cache...")
		    self.create_dir_and_download_model(model_name, model_item, output_path)
		else:
		    print(f" > {model_name} is already downloaded.")
	else:
	    print(f" > {model_name} has been updated, clearing model cache...")
	    self.create_dir_and_download_model(model_name, model_item, output_path)

The source code version may frequently encounter errors when starting, which are mainly caused by proxy problems that prevent the model from being downloaded completely from outside the firewall. It is recommended to use a stable proxy and enable it globally. If you still cannot download completely, it is recommended to use the pre-compiled version.

Frequently Asked Questions

The xtts model can only be used for learning and research and cannot be used for commercial purposes.

For the source code version, you need to set a proxy in .env using HTTP_PROXY= (e.g., http://127.0.0.1:7890). Models are downloaded from https://huggingface.co and https://github.com, which may be inaccessible in some regions. Ensure your proxy is stable and reliable, otherwise, large model downloads may fail.
The model needs to be cold-loaded after startup, which will take some time. Please wait patiently until http://127.0.0.1:9988 is displayed and the browser page opens automatically. Wait for two or three minutes before performing the conversion.
Available functions:
- Text to Speech: Input text and generate speech using the selected voice.
- Voice to Voice: Select an audio file from your local drive and generate another audio file using the selected voice.
If the opened CMD window does not move for a long time, you need to press Enter on it to continue outputting. Click on the icon in the upper left corner of the CMD, select "Properties", and then uncheck the "Quick Edit Mode" and "Insert Mode" checkboxes.
Pre-compiled version: Voice-to-voice thread fails to start
First, confirm that the model has been downloaded and placed correctly. The tts folder contains 3 folders, as shown in the figure below.
If it has been placed correctly but still fails, click to download extra-to-tts_cache.zip, copy the 2 files obtained after decompression to the tts_cache folder in the software root directory.
If the above method is invalid, fill in the proxy address after HTTP_PROXY in the .env file, for example, HTTP_PROXY=http://127.0.0.1:7890, which can solve the problem. You must ensure that the proxy is stable and the port is filled in correctly.
Prompt: "The text length exceeds the character limit of 182/82 for language"
This is because sentences separated by periods are too long. It is recommended to separate long sentences with periods instead of using a lot of commas. Alternatively, you can open the clone/character.json file and manually modify the limit.
Prompt: "symbol not found __svml_cosf8_ha"

Open the website https://www.dll-files.com/svml_dispmd.dll.html, click the red "Download" text, download and extract it, copy and paste the DLL file inside to "C:\Windows\System32".

CUDA Acceleration Support

Install CUDA Tools Detailed Installation Method

If your computer has an NVIDIA graphics card, first upgrade the graphics card driver to the latest version, and then install the corresponding CUDA Toolkit 11.8 and cudnn for CUDA11.X.

After the installation is complete, press Win + R, enter cmd and press Enter. In the pop-up window, enter nvcc --version to confirm that the version information is displayed, similar to this image:

Then continue to enter nvidia-smi to confirm that there is output information and that you can see the CUDA version number, similar to this image:

This indicates that the installation is correct and CUDA acceleration can be used; otherwise, you need to reinstall.

clone-voice: Voice Cloning Tool ​

Using the Windows Pre-compiled Version (Source Code Deployment for Other Systems) ​

Source Code Deployment (Linux, macOS, Windows) ​

Frequently Asked Questions ​

CUDA Acceleration Support ​

clone-voice: Voice Cloning Tool

Using the Windows Pre-compiled Version (Source Code Deployment for Other Systems)

Source Code Deployment (Linux, macOS, Windows)

Frequently Asked Questions

CUDA Acceleration Support