clone-voice: Voice Cloning Tool
The model used in this project is xtts_v2 from coqui.ai. The model's open-source license is Coqui Public Model License 1.0.0. Please abide by this license when using this project. The full text of the agreement can be found at https://coqui.ai/cpml.txt.
This is a voice cloning tool that can synthesize text into speech using any human voice, or convert one voice into another using that voice.
It is very easy to use, and you don't need an NVIDIA GPU. Download the pre-compiled version and open a web interface by double-clicking app.exe
. Just point and click to use it.
It supports 16 languages, including Chinese, English, Japanese, Korean, French, German, and Italian. It can record sound from a microphone online.
To ensure the synthesis effect, it is recommended to record for 5 to 20 seconds with clear and accurate pronunciation and no background noise.
The English effect is excellent, and the Chinese effect is passable.
How to Use the Windows Pre-compiled Version (Source Code Deployment for Other Systems)
Click here to open the Releases download page and download the pre-compiled main file (1.7G) and model (3G).
After downloading, extract them to a location, such as
E:/clone-voice
.Double-click
app.exe
and wait for the web window to open automatically. Please read the text prompts in the CMD window carefully. Any errors will be displayed here.After the model is downloaded, extract it to the
tts
folder in the software directory. The extracted effect is shown in the figure.Conversion steps:
Select the Text -> Voice button, enter text in the text box, or click to import an SRT subtitle file, and then click Start Now.
Select the Voice -> Voice button, click or drag the audio file (mp3/wav/flac) to be converted, and then select the voice to be cloned from the Voice File to Use drop-down box. If there is nothing satisfactory, you can also click the Upload Local button to select a recorded 5-20s wav/mp3/flac sound file. Or click the Start Recording button to record your own voice online for 5-20s. After recording, click Use. Then click the Start Now button.
If the machine has an NVIDIA GPU and the CUDA environment is configured correctly, CUDA acceleration will be used automatically.
Source Code Deployment (Linux, Mac, Windows)
For the source code version, you need to set the proxy in .env
as HTTP_PROXY=your_proxy_address
(e.g., http://127.0.0.1:7890
). You need to download the model from https://huggingface.co
and https://github.com
, but these websites cannot be accessed in China. You must ensure that the proxy is stable and reliable; otherwise, the large model download may fail.
Requires Python 3.9 -> 3.11 and the git-cmd tool installed in advance. Download address.
Create an empty directory, such as
E:/clone-voice
. Open a CMD window in this directory. The method is to entercmd
in the address bar and then press Enter. Use git to pull the source code to the current directory:git clone [email protected]:jianchang512/clone-voice.git .
Create a virtual environment:
python -m venv venv
Activate the environment. In Windows:
E:/clone-voice/venv/scripts/activate
.Install dependencies:
pip install -r requirements.txt --no-deps
If you want to enable CUDA acceleration in Windows and Linux, continue to executepip uninstall -y torch
to uninstall, and then executepip install torch torchaudio --index-url https://download.pytorch.org/whl/cu121
. (You must have an NVIDIA card and have configured the CUDA environment.)In Windows, extract
ffmpeg.7z
and placeffmpeg.exe
in the same directory asapp.py
. For Linux and Mac, go to the FFmpeg official website to download the corresponding version of FFmpeg, extract theffmpeg
program, and place the executable binary fileffmpeg
in the same directory asapp.py
.First run
python code_dev.py
. When prompted to agree to the agreement, entery
and wait for the model to download.Downloading the model requires a global proxy. The model is very large. If the proxy is not stable and reliable enough, you may encounter many errors. Most errors are caused by proxy issues.
If it shows that multiple models have been downloaded successfully, but the error "Downloading WavLM model" is still displayed, you need to modify the library package file
\venv\Lib\site-packages\aiohttp\client.py
. Add your proxy address above the lineif proxy is not None:
around line 535, for example,proxy="http://127.0.0.1:10809"
.After the download is complete, start
python app.py
again.[Training Instructions] If you want to train, execute
python train.py
. The training parameters are adjusted inparam.json
. After the adjustment, re-execute the training scriptpython train.py
.Each startup will connect to the outside to detect or update the model. Please be patient. If you do not want to detect or update every time you start, you need to manually modify the file under the dependency package, open
\venv\Lib\site-packages\TTS\utils\manage.py
, around line 389, in thedef download_model
method, comment out the following code
if md5sum is not None:
md5sum_file = os.path.join(output_path, "hash.md5")
if os.path.isfile(md5sum_file):
with open(md5sum_file, mode="r") as f:
if not f.read() == md5sum:
print(f" > {model_name} has been updated, clearing model cache...")
self.create_dir_and_download_model(model_name, model_item, output_path)
else:
print(f" > {model_name} is already downloaded.")
else:
print(f" > {model_name} has been updated, clearing model cache...")
self.create_dir_and_download_model(model_name, model_item, output_path)
- The source code version may frequently encounter errors when starting, which are basically caused by proxy problems that prevent the model from being downloaded completely from outside the wall. It is recommended to use a stable proxy and enable it globally. If you still cannot download it completely, it is recommended to use the pre-compiled version.
Common Issues
The model xtts can only be used for learning and research, and cannot be used for commercial purposes.
For the source code version, you need to set the proxy in
.env
asHTTP_PROXY=your_proxy_address
(e.g.,http://127.0.0.1:7890
). You need to download the model fromhttps://huggingface.co
andhttps://github.com
, but these websites cannot be accessed in China. You must ensure that the proxy is stable and reliable; otherwise, the large model download may fail.After starting, the model needs to be cold-loaded, which will take some time. Please wait patiently until
http://127.0.0.1:9988
is displayed and the browser page is automatically opened, and then wait for two or three minutes before performing the conversion.Functions include:
- Text-to-speech: Enter text and generate sound with the selected voice.
- Voice-to-voice: Select an audio file from the local machine and generate another audio file with the selected voice.
If the opened CMD window does not move for a long time, you need to press Enter on it to continue outputting. Click the icon in the upper left corner of the CMD, select "Properties", and then uncheck the "Quick Edit" and "Insert Mode" checkboxes.
Pre-compiled version: Voice-to-voice thread failed to start.
First confirm that the model has been downloaded and placed correctly. There are 3 folders in the tts folder, as shown in the figure.
If it has been placed correctly, but the error still occurs, click to download extra-to-tts_cache.zip, copy the 2 files obtained after decompression to the tts_cache folder in the software root directory.
If the above method is invalid, fill in the proxy address after HTTP_PROXY in the .env file, such as
HTTP_PROXY=http://127.0.0.1:7890
, which can solve the problem. You must ensure that the proxy is stable and the port is correct.Prompt "The text length exceeds the character limit of 182/82 for language"
This is because the sentences separated by periods are too long. It is recommended to separate too long sentences with periods instead of using a large number of commas, or you can open the
clone/character.json
file and manually modify the limit.Prompt "symbol not found __svml_cosf8_ha"
Open the webpage https://www.dll-files.com/svml_dispmd.dll.html, click the red "Download" to download, decompress it, and copy and paste the dll file inside to "C:\Windows\System32"
CUDA Acceleration Support
Install CUDA tools Detailed installation method
If your computer has an Nvidia graphics card, first upgrade the graphics card driver to the latest version, and then install the corresponding CUDA Toolkit 11.8 and cudnn for CUDA11.X.
After the installation is complete, press Win + R
, enter cmd
and press Enter. In the pop-up window, enter nvcc --version
to confirm that version information is displayed, similar to this figure.
Then continue to enter nvidia-smi
to confirm that there is output information and that you can see the cuda version number, similar to this figure.
This shows that the installation is correct and cuda acceleration can be used; otherwise, you need to reinstall it.