Voice Cloning Tool
clone-voice Open Source Project Address
The models used in this project are all from https://github.com/coqui-ai/TTS, and the model agreement is CPML. It can only be used for learning and research, and cannot be used for commercial purposes.
This is a voice cloning tool that can use any human voice to synthesize a voice speaking a piece of text using that voice, or convert one voice into another using that voice.
It is very simple to use. You can use it without an NVIDIA GPU. Download the pre-compiled version, double-click app.exe to open a web interface, and you can use it by clicking with the mouse.
Supports 16 languages including Chinese, English, Japanese, Korean, French, German, Italian, etc., and can record sounds online from the microphone.
To ensure the synthesis effect, it is recommended to record for 5 to 20 seconds, with clear and accurate pronunciation, and no background noise.
The English effect is great, and the Chinese effect is okay.
How to use the Windows pre-compiled version (other systems can deploy the source code)
Click here to open the Releases download page, download the pre-compiled main file (1.7G) and model (3G)
After downloading, extract it to somewhere, such as E:/clone-voice
Double-click app.exe, wait for the web window to open automatically, please read the text prompts in the cmd window carefully, any errors will be displayed here
After the model is downloaded, extract it to the
tts
folder in the software directory.Conversion operation steps
Select the [Text -> Voice] button, enter text in the text box, or click to import the srt subtitle file, and then click "Start Now".
Select the [Voice -> Voice] button, click or drag the audio file (mp3/wav/flac) to be converted, and then select the voice to be cloned from the "Voice file to use" drop-down box. If there is no satisfactory one, you can also click the "Local Upload" button to select a recorded 5-20s wav/mp3/flac sound file. Or click the "Start Recording" button to record your own voice online for 5-20s, and click Use after recording. Then click the "Start Now" button
If the machine has an NVIDIA GPU and the CUDA environment is configured correctly, CUDA acceleration will be used automatically.
Source code deployment (linux mac window)
The source code version requires a global proxy, because it needs to download the model from https://huggingface.co, which cannot be accessed in China
Requires python 3.9->3.11
Create an empty directory, such as E:/clone-voice, open the cmd window in this directory, the method is to enter
cmd
in the address bar, and then press Enter. Use git to pull the source code to the current directorygit clone [email protected]:jianchang512/clone-voice.git .
Create a virtual environment
python -m venv venv
Activate the environment,
E:/clone-voice/venv/scripts/activate
under win,Install dependencies:
pip install -r requirements.txt
Under win, unzip ffmpeg.7z, and put
ffmpeg.exe
in the same directory asapp.py
. For linux and mac, go to the ffmpeg official website to download the corresponding version of ffmpeg, and unzip theffmpeg
program to the root directory. The executable binary fileffmpeg
and app.py must be in the same directory.First run
python code_dev.py
, entery
when prompted to agree to the agreement, and then wait for the model to be downloaded. Downloading the model requires a global proxy, and the model is very large. If the proxy is not stable enough, you may encounter many errors. Most of the errors are caused by proxy problems.If it shows that multiple models have been downloaded successfully, but it still prompts "Downloading WavLM model" error, you need to modify the library package file
\venv\Lib\site-packages\aiohttp\client.py
, add your proxy address on the line aboveif proxy is not None:
at around line 535, such asproxy="http://127.0.0.1:10809"
.After the download is complete, start
python app.py
again,Each startup will connect to the outside to detect or update the model, please be patient and wait. If you don't want to detect or update every time you start, you need to manually modify the file under the dependency package, open \venv\Lib\site-packages\TTS\utils\manage.py, around line 389, in the def download_model method, comment out the following code
if md5sum is not None:
md5sum_file = os.path.join(output_path, "hash.md5")
if os.path.isfile(md5sum_file):
with open(md5sum_file, mode="r") as f:
if not f.read() == md5sum:
print(f" > {model_name} has been updated, clearing model cache...")
self.create_dir_and_download_model(model_name, model_item, output_path)
else:
print(f" > {model_name} is already downloaded.")
else:
print(f" > {model_name} has been updated, clearing model cache...")
self.create_dir_and_download_model(model_name, model_item, output_path)
- The source code version may frequently encounter errors when starting, which are basically caused by proxy problems that prevent the model from being downloaded from outside the wall or the download is interrupted and incomplete. It is recommended to use a stable proxy and turn it on globally. If you still cannot download it completely, it is recommended to use the pre-compiled version.
CUDA acceleration support
Detailed installation method of CUDA tools
Precautions
The xtts model can only be used for learning and research, and cannot be used for commercial purposes.
The source code version requires a global proxy, because it needs to download the model from https://huggingface.co, which cannot be accessed in China. The source code version may frequently encounter errors when starting, which are basically caused by proxy problems that prevent the model from being downloaded from outside the wall or the download is interrupted and incomplete. It is recommended to use a stable proxy and turn it on globally. If you still cannot download it completely, it is recommended to use the pre-compiled version.
After startup, the model needs to be cold-loaded, which will take some time. Please wait patiently for
http://127.0.0.1:9988
to be displayed, and after the browser page is automatically opened, wait two or three minutes before converting.Functions include:
Text to speech: that is, input text and generate sound with the selected voice. Voice to voice: that is, select an audio file from the local and generate another audio file with the selected voice.
If the opened cmd window does not move for a long time, you need to press Enter on it to continue outputting. Please click on the icon in the upper left corner of cmd, select "Properties", and then uncheck the "Quick Edit" and "Insert Mode" checkboxes.