Skip to content

clone-voice Voice Cloning Tool

The model used in this project is xtts_v2 from coqui.ai, licensed under the Coqui Public Model License 1.0.0. Please comply with this license when using the project. Full license text available at https://coqui.ai/cpml.txt.

This is a voice cloning tool that can use any human voice to synthesize text into speech with that voice or convert one voice to another using the target voice.

It's very easy to use and works without an NVIDIA GPU. Download the pre-compiled version, double-click app.exe to open a web interface, and use it with simple clicks.

Supports 16 languages including Chinese, English, Japanese, Korean, French, German, Italian, and allows online voice recording from a microphone.

For best synthesis results, record a clear and accurate voice clip of 5 to 20 seconds without background noise.

English results are excellent, while Chinese results are acceptable.

How to Use the Pre-compiled Windows Version (Other Systems Can Deploy from Source)

  1. Click here to open the Releases download page, download the pre-compiled main file (1.7GB) and the model (3GB).

  2. Extract the downloaded files to a location, e.g., E:/clone-voice.

  3. Double-click app.exe and wait for the web window to open automatically. Carefully read the text prompts in the cmd window; any errors will be displayed here.

  4. After downloading the model, extract it to the tts folder in the software directory. The result after extraction should look like this:

image

  1. Conversion Steps:

    • Select the 【Text->Voice】 button, enter text in the text box or import an SRT subtitle file, then click "Start Now".

    • Select the 【Voice->Voice】 button, click or drag the audio file to convert (mp3/wav/flac), then choose the voice to clone from the "Voice File to Use" dropdown. If none are satisfactory, click "Upload Local" to select a pre-recorded 5-20s wav/mp3/flac voice file, or click "Start Recording" to record your own voice online for 5-20s, then click "Use" after recording. Finally, click "Start Now".

  2. If the machine has an NVIDIA GPU and CUDA environment correctly configured, CUDA acceleration will be automatically used.

Source Code Deployment (Linux, Mac, Windows)

The source code version requires setting HTTP_PROXY in the .env file (e.g., HTTP_PROXY=http://127.0.0.1:7890). Models are downloaded from https://huggingface.co and https://github.com, which are inaccessible in some regions. Ensure the proxy is stable and reliable; otherwise, large model downloads may fail midway.

  1. Requirements: Python 3.9–3.11, and install the git-cmd tool in advance. Download here.

  2. Create an empty directory, e.g., E:/clone-voice, open a cmd window in this directory by typing cmd in the address bar and pressing Enter. Use git to pull the source code to the current directory: git clone [email protected]:jianchang512/clone-voice.git .

  3. Create a virtual environment: python -m venv venv

  4. Activate the environment: On Windows, E:/clone-voice/venv/scripts/activate.

  5. Install dependencies: pip install -r requirements.txt --no-deps. For CUDA acceleration on Windows and Linux, continue by executing pip uninstall -y torch to uninstall, then pip install torch torchaudio --index-url https://download.pytorch.org/whl/cu121. (Requires an NVIDIA GPU and properly configured CUDA environment.)

  6. On Windows, extract ffmpeg.7z and place ffmpeg.exe in the same directory as app.py. On Linux and Mac, download the corresponding ffmpeg version from the ffmpeg official website, extract the ffmpeg binary, and place it in the root directory. Ensure the executable ffmpeg is in the same directory as app.py.

    image

  7. First run python code_dev.py. When prompted to agree to the license, enter y, then wait for the model download to complete.

    Downloading the model requires a global proxy. The model is very large, and if the proxy is unstable, many errors may occur; most errors are due to proxy issues.

    If multiple models show as successfully downloaded but it still prompts "Downloading WavLM model" error, modify the library file \venv\Lib\site-packages\aiohttp\client.py. Around line 535, above if proxy is not None:, add your proxy address, e.g., proxy="http://127.0.0.1:10809".

  8. After download completes, start with python app.py.

  9. 【Training Instructions】 To train, execute python train.py. Adjust training parameters in param.json, then rerun the training script python train.py.

  10. Each startup connects to external servers to check or update models; please wait patiently. To avoid this, manually modify the dependency file: Open \venv\Lib\site-packages\TTS\utils\manage.py, around line 389, in the def download_model method, comment out the following code:

if md5sum is not None:
	md5sum_file = os.path.join(output_path, "hash.md5")
	if os.path.isfile(md5sum_file):
	    with open(md5sum_file, mode="r") as f:
		if not f.read() == md5sum:
		    print(f" > {model_name} has been updated, clearing model cache...")
		    self.create_dir_and_download_model(model_name, model_item, output_path)
		else:
		    print(f" > {model_name} is already downloaded.")
	else:
	    print(f" > {model_name} has been updated, clearing model cache...")
	    self.create_dir_and_download_model(model_name, model_item, output_path)
  1. Source code deployment may frequently encounter errors, mostly due to proxy issues preventing model downloads or incomplete downloads. Use a stable, globally enabled proxy. If downloads remain incomplete, use the pre-compiled version.

Common Issues

The xtts model is for learning and research only, not for commercial use.

  1. The source code version requires setting HTTP_PROXY in the .env file (e.g., HTTP_PROXY=http://127.0.0.1:7890). Models are downloaded from https://huggingface.co and https://github.com, which are inaccessible in some regions. Ensure the proxy is stable and reliable; otherwise, large model downloads may fail midway.

  2. After startup, the model needs cold loading, which takes some time. Wait patiently until http://127.0.0.1:9988 is displayed and the browser page opens automatically, then wait 2–3 minutes before converting.

  3. Features include:

     Text to Speech: Input text and generate speech with the selected voice.
     
     Voice to Voice: Select a local audio file and generate another audio file with the selected voice.
    
  4. If the cmd window remains unresponsive for a long time and requires pressing Enter to continue output, click the icon in the top-left corner of the cmd window, select "Properties", and uncheck "Quick Edit Mode" and "Insert Mode".

  5. Pre-compiled Version: Voice-to-voice thread fails to start.

    First, confirm the model is correctly downloaded and placed. The tts folder should contain 3 folders, as shown below: image

    If correctly placed but errors persist, click to download extra-to-tts_cache.zip, extract the two files, and copy them to the tts_cache folder in the software root directory.

    If the above doesn't work, set the proxy address in the .env file, e.g., HTTP_PROXY=http://127.0.0.1:7890, to resolve the issue. Ensure the proxy is stable and the port is correct.

  6. Prompt: "The text length exceeds the character limit of 182/82 for language"

    This is due to sentences separated by periods being too long. Use periods to break up long sentences instead of many commas, or manually modify the limit in clone/character.json.

  7. Prompt: "symbol not found __svml_cosf8_ha"

    Open the webpage https://www.dll-files.com/svml_dispmd.dll.html, click the red "Download" text, download and extract, then copy the dll file to "C:\Windows\System32".

CUDA Acceleration Support

Install CUDA Tools Detailed Installation Guide

If your computer has an NVIDIA graphics card, first update the graphics driver to the latest version, then install the corresponding CUDA Toolkit 11.8 and cudnn for CUDA11.X.

After installation, press Win + R, type cmd, and press Enter. In the opened window, type nvcc --version to confirm version information is displayed, similar to this image: image

Then type nvidia-smi to confirm output information and see the CUDA version, similar to this image: image

If correct, CUDA acceleration is enabled; otherwise, reinstall.