MegaTTS3 is a Chinese and English voice cloning project open-sourced by ByteDance, and its results are quite impressive. However, the official installation documentation is somewhat brief, especially on Windows systems, where many users report installation difficulties. This tutorial aims to help everyone solve these problems and successfully install and use MegaTTS3 on Windows.
Before we begin, let's understand a few basic concepts that will be used in the tutorial:
- CMD Console (Command Prompt):
- How to open: In the address bar of the folder you want to work in (e.g.,
D:/python/megatts3
), delete the original path, typecmd
, and press Enter. - Purpose: A black window will pop up, which is the CMD console. All commands mentioned in this tutorial are entered and executed by pressing Enter here.
- How to open: In the address bar of the folder you want to work in (e.g.,
- Executing Commands:
- Enter a specified line of text (i.e., "command") in the CMD console, and then press Enter.
First-Time Installation and Configuration
Strongly Recommended: Use
Miniconda
to deploy MegaTTS3 on Windows systems to avoid unnecessary troubles. The following tutorials are based onMiniconda
. Example Path: This tutorial assumes that your working directory (where you install MegaTTS3) isD:/python/megatts3
. If your path is different, please modify the paths in the commands accordingly.
Step 1: Install Miniconda
Download Miniconda:
- Visit in your browser:
https://www.anaconda.com/download/success#miniconda
- Find the
Miniconda Installers
section on the page and click the download link.
- Visit in your browser:
Install Miniconda:
- Double-click the downloaded
.exe
installation file. - Click
Next
all the way, and clickI Agree
on the license agreement page. - Crucial Step: When selecting installation options, be sure to check the second checkbox "Add Miniconda3 to my PATH environment variable". Ignore the red warning text next to it, please be sure to check it.
- Continue clicking
Next
orInstall
until the installation is complete.
- Double-click the downloaded
Step 2: Download MegaTTS3 Source Code
Visit the Official Repository:
- Open the website
https://github.com/bytedance/MegaTTS3
- Open the website
Download the Code:
- Click the green
<> Code
button and then selectDownload ZIP
.
- Click the green
Unzip and Place the Files:
- Unzip the downloaded
MegaTTS3-main.zip
file. - Copy all files and subfolders inside the unzipped
MegaTTS3-main
folder to your prepared working directory, such asD:/python/megatts3
. - After copying, the
D:/python/megatts3
folder should contain folders likeassets
,checkpoints
, andtts
.
- Unzip the downloaded
Step 3: Create and Activate a Virtual Environment
Open CMD Console:
- Go to your working directory
D:/python/megatts3
. - Enter
cmd
in the address bar and press Enter.
- Go to your working directory
Create a Virtual Environment:
- Enter the following command in the CMD console to create an environment named
megatts3env
that uses Python 3.10:
- Enter the following command in the CMD console to create an environment named
conda create -n megatts3env python=3.10
During the installation process, if prompted with
Proceed ([y]/n)?
, enter y
and press Enter.
- Activate the Virtual Environment:
- After creation, enter the following command to activate the environment (you need to execute this step to activate the virtual environment every time you run MegaTTS3 in the future):
conda activate megatts3env
After successful activation, (megatts3env)
will be displayed before the command prompt.
Note: All the following installation and running commands must be executed in the CMD console with the (megatts3env)
environment activated!
Step 4: Install Dependencies
Special Note: Installing directly according to the official repository documentation on Windows usually fails. Be sure to strictly follow the order below to execute the commands.
Install pynini:
- Enter and execute in the activated CMD console:bash
conda install -y -c conda-forge pynini==2.1.5
- Wait for the command to complete execution.
- Enter and execute in the activated CMD console:
Install WeTextProcessing 1.0.3:
- Continue to enter and execute in the CMD console:bash
pip install WeTextProcessing==1.0.3
- Wait for the command to complete execution.
- Continue to enter and execute in the CMD console:
Modify requirements.txt and Install Remaining Dependencies:
- Open the
requirements.txt
file in your working directory (D:/python/megatts3
) using Notepad or another text editor. - Find and delete the line containing
WeTextProcessing==1.0.4.1
. - Save and close the file.
- Return to the CMD console and execute the following command to install the remaining dependencies:bash
pip install -r requirements.txt
- Open the
- Set Environment Variables:
- Copy and paste the following command completely into the CMD console, and then press Enter to execute. Note: If your installation directory is not
D:/python/megatts3
, please modify the path in the command to your actual path.bashconda env config vars set PYTHONPATH="D:/python/megatts3;%PYTHONPATH%"
- After successful setting, you need to close the current CMD window, then reopen a new CMD window, and reactivate the environment
conda activate megatts3env
, so that the environment variables will take effect.
- Copy and paste the following command completely into the CMD console, and then press Enter to execute. Note: If your installation directory is not
Check: If all the above steps have no errors (ignore some yellow warning messages WARN), then the dependency environment is installed successfully. If you encounter red errors, please carefully check whether you have strictly followed the order of execution, especially whether you have correctly modified the requirements.txt
file.
Step 5: Download Pre-trained Models
Hint: The model files are hosted on Hugging Face Hub, which cannot be accessed domestically, and you must use a VPN.
- Again, ensure that your CMD console is in the activated
(megatts3env)
state. - Execute the following command to download the model files to the
checkpoints
folder in the working directory:bashhuggingface-cli download ByteDance/MegaTTS3 --local-dir ./checkpoints --local-dir-use-symlinks False
- Wait patiently for the download to complete.
Step 6: (Optional) Add GPU Acceleration Support
If your computer is equipped with an NVIDIA graphics card and has installed CUDA 12.x, you can accelerate speech synthesis by installing the GPU version.
- Make sure the CMD console has activated
(megatts3env)
. - Execute the following command:
pip install --force-reinstall torch torchvision torchaudio --index-url https://download.pytorch.org/whl/cu126
At this point, all installation and configuration work is complete!
Start MegaTTS3 Web Service
Every time you want to use MegaTTS3, you need to start it by following these steps.
Open CMD Console:
- Go to your MegaTTS3 working directory (e.g.,
D:/python/megatts3
). - Enter
cmd
in the address bar and press Enter.
- Go to your MegaTTS3 working directory (e.g.,
Activate the Virtual Environment:
- Execute the command:
conda activate megatts3env
- Execute the command:
(Recommended) Modify Gradio Listening Address:
- Strongly recommended to perform this step before the first startup: Open the file
D:\python\megatts3\tts\gradio_api.py
with a code editor or Notepad. - Scroll to the end of the file and find
server_name="0.0.0.0"
and change it toserver_name="127.0.0.1"
. - Reason: Using
0.0.0.0
on Windows may cause a large number of irrelevant error messages or even startup failure. Changing to127.0.0.1
is usually more stable. - Save the file after modification.
- Strongly recommended to perform this step before the first startup: Open the file
- Start the Program:
- Execute in the activated CMD console:bash
python tts/gradio_api.py
- Execute in the activated CMD console:
- If the startup is successful, you will see output similar to the following in the CMD console, indicating that the service is running:
Access the Web Interface:
- Open this address in your browser:
http://127.0.0.1:7929
.
- Open this address in your browser:
Use MegaTTS3 for Voice Cloning
Understanding the Source of Voices
MegaTTS3 is currently a "semi-open source" project. This means that you cannot clone any voice samples you provide. You can only use the voices (latents) that ByteDance officially pre-processed and published on a specific page.
- Official Explanation: This is done for security and legal compliance reasons.
- If You Want to Clone Your Own Voice: You need to submit your audio according to the official designated method, wait for their review and approval, and then download it from the Latents page for use. (See below for specific methods)
Download Available Voice Files
Visit the Google Drive Folder:
- You need to use a VPN to access Google services and have a Google account (you can register for free if you don't have one).
- Open the website (i.e., latents page):
https://drive.google.com/drive/folders/1QhcHWcy20JfqWjgqZX1YM3I6i9u4oNlr
- Here are three subfolders (
librispeech_testclean_40
,official_test_case
,user_batch_1-3
) containing all currently available voices.
Select and Download Files:
- Enter any folder, browse the
.wav
audio files inside, listen to them, and select the voice you want to clone (right-click on the wav file - Open with - Preview, to listen). - Important: When you decide to download a
.wav
file (e.g.,speaker_xxx.wav
), you must also download the.npy
file with the same name (i.e.,speaker_xxx.npy
). These two files are used in pairs and are indispensable. - Save the downloaded
.wav
and.npy
files on your computer.
- Enter any folder, browse the
Synthesize Speech in the Web Interface
Open the Web Interface:
- Make sure the MegaTTS3 service is running and open
http://127.0.0.1:7929
in your browser.
- Make sure the MegaTTS3 service is running and open
Upload Voice Files:
- Find the upload area on the page.
- Click the "Upload.wav" area and select the
.wav
file you just downloaded. - Click the "Upload.npy" area and select the
.npy
file with the same name as the.wav
file.
Enter Text and Synthesize:
- In the "Input Text" input box, enter the Chinese or English text you want the voice to read.
- Click the "Submit" button to execute.
Get Results:
- Wait a short time, the synthesis process will be performed in the background.
- Once completed, you can directly play the generated voice in the upper right corner, or find the download button to save it as an audio file.
Now you have successfully installed and used MegaTTS3 for voice cloning on Windows!
Upload the Voice You Want to Clone
If the voice you want to clone doesn't exist, you can upload it yourself
- First, convert the audio file of the voice you want to clone to the wav format, and the duration should not exceed 24 seconds, it is recommended to be within 5-24 seconds
- It must be ensured that the audio content is legal, does not infringe copyright, and has no background noise, clear pronunciation, and one speaker
- Open this website
https://drive.google.com/drive/folders/1gCWL1y_2xu9nIFhUX_OW5MbcFuB7J5Cl
, drag the wav file you have organized into it, and then wait for the review to pass before you can use it.
After ByteDance approves, it will create an npy file with the same name, and then put the wav and npy files into the
user_batch_1-3
folder of thelatens
page mentioned above, and then you can download this wav file and the npy file with the same name to clone.