Skip to content

Voice Cloning Tool

clone-voice Voice Cloning Tool Open Source Project Address

The models used in this project are all from https://github.com/coqui-ai/TTS. The model agreement is CPML and can only be used for learning and research, and cannot be used for commercial purposes.

This is a voice cloning tool that can use any human voice to synthesize a piece of text into a voice speaking in that tone, or convert one voice into another voice using that tone.

It is very simple to use. You can use it even without an NVIDIA GPU. Download the pre-compiled version, double-click app.exe to open a web interface, and you can use it by clicking the mouse.

Supports 16 languages including Chinese, English, Japanese, Korean, French, German, and Italian. Can record sound online from the microphone.

To ensure the synthesis effect, it is recommended to record for 5 to 20 seconds with clear and accurate pronunciation and no background noise.

The English effect is great, and the Chinese effect is okay.

How to use the Windows pre-compiled version (other systems can deploy the source code)

  1. Click here to open the Releases download page, download the pre-compiled version of the main file (1.7G) and the model (3G).

  2. After downloading, extract it to a directory, such as E:/clone-voice.

  3. Double-click app.exe, wait for the web window to open automatically, please read the text prompts in the cmd window carefully. If there are any errors, they will be displayed here.

  4. After the model is downloaded, extract it to the tts folder under the software directory.

  5. Conversion operation steps

    • Select the [Text -> Voice] button, enter text in the text box, or click to import the srt subtitle file, and then click "Start Now".

    • Select the [Voice -> Voice] button, click or drag the audio file to be converted (mp3/wav/flac), and then select the tone to be cloned from the "Voice file to use" drop-down box. If there is no satisfactory one, you can also click the "Local Upload" button to select the recorded 5-20s wav/mp3/flac sound file. Or click the "Start Recording" button to record your own voice online for 5-20s, and click Use after recording. Then click the "Start Now" button.

  6. If the machine has an NVIDIA GPU and the CUDA environment is configured correctly, CUDA acceleration will be used automatically.

Source Code Deployment (linux mac window)

The source code version requires a global proxy, because it needs to download models from https://huggingface.co, which cannot be accessed in China.

  1. Requires python 3.9->3.11

  2. Create an empty directory, such as E:/clone-voice, and open the cmd window in this directory. The method is to enter cmd in the address bar, and then press Enter. Use git to pull the source code to the current directory git clone [email protected]:jianchang512/clone-voice.git .

  3. Create a virtual environment python -m venv venv

  4. Activate the environment, win is E:/clone-voice/venv/scripts/activate,

  5. Install dependencies: pip install -r requirements.txt

  6. In win, extract ffmpeg.7z, put ffmpeg.exe and app.py in the same directory. For linux and mac, go to the ffmpeg official website to download the corresponding version of ffmpeg, and extract the ffmpeg program to the root directory. You must put the executable binary file ffmpeg and app.py in the same directory.

    First run python code_dev.py , and enter y when prompted to agree to the agreement, and then wait for the model to download. Downloading models requires a global proxy. The model is very large. If the proxy is not stable and reliable, you may encounter many errors. Most of the errors are caused by proxy problems.

    If it shows that multiple models have been downloaded successfully, but the error "Downloading WavLM model" is still prompted, you need to modify the library package file \venv\Lib\site-packages\aiohttp\client.py, add your proxy address above the line if proxy is not None: around line 535, for example, proxy="http://127.0.0.1:10809".

  7. After the download is complete, start python app.py again.

  8. Each startup will connect to the outside world to detect or update the model, please wait patiently. If you don't want to detect or update every time you start, you need to manually modify the file under the dependent package, open \venv\Lib\site-packages\TTS\utils\manage.py, around line 389, in the def download_model method, comment out the following code

if md5sum is not None:
	md5sum_file = os.path.join(output_path, "hash.md5")
	if os.path.isfile(md5sum_file):
	    with open(md5sum_file, mode="r") as f:
		if not f.read() == md5sum:
		    print(f" > {model_name} has been updated, clearing model cache...")
		    self.create_dir_and_download_model(model_name, model_item, output_path)
		else:
		    print(f" > {model_name} is already downloaded.")
	else:
	    print(f" > {model_name} has been updated, clearing model cache...")
	    self.create_dir_and_download_model(model_name, model_item, output_path)
  1. The source code version may frequently encounter errors when starting, which are basically caused by proxy problems that prevent the model from being downloaded completely from outside the wall or the download is interrupted. It is recommended to use a stable proxy and enable it globally. If you still cannot download it completely, it is recommended to use the pre-compiled version.

CUDA Acceleration Support

Detailed installation method for installing CUDA tools

Precautions

The model xtts can only be used for learning and research and cannot be used for commercial purposes.

  1. The source code version requires a global proxy, because it needs to download models from https://huggingface.co, which cannot be accessed in China. The source code version may frequently encounter errors when starting, which are basically caused by proxy problems that prevent the model from being downloaded completely from outside the wall or the download is interrupted. It is recommended to use a stable proxy and enable it globally. If you still cannot download it completely, it is recommended to use the pre-compiled version.

  2. After starting, it takes some time to cold-load the model. Please wait patiently until http://127.0.0.1:9988 is displayed and the browser page is automatically opened, and then perform the conversion after waiting for two or three minutes.

  3. Functions include:

    Text to speech: that is, enter text and use the selected tone to generate sound.
    
    Voice to voice: that is, select an audio file from the local and use the selected tone to generate another audio file.
    
  4. If the opened cmd window does not move for a long time, you need to press Enter on it to continue outputting. Please click on the icon in the upper left corner of cmd, select "Properties", and then uncheck the "Quick Edit" and "Insert Mode" check boxes.