Skip to content

MegaTTS3 is a Chinese and English voice cloning project open-sourced by ByteDance, and its results are quite impressive. However, the official installation documentation is somewhat brief, especially on Windows systems, where many users report installation difficulties. This tutorial aims to help everyone solve these problems and successfully install and use MegaTTS3 on Windows.

Before we begin, let's understand a few basic concepts that will be used in the tutorial:

  • CMD Console (Command Prompt):
    • How to open: In the address bar of the folder you want to work in (e.g., D:/python/megatts3), delete the original path, type cmd, and press Enter. Open CMD Console
    • Purpose: A black window will pop up, which is the CMD console. All commands mentioned in this tutorial are entered and executed by pressing Enter here. CMD Console Example
  • Executing Commands:
    • Enter a specified line of text (i.e., "command") in the CMD console, and then press Enter.

First-Time Installation and Configuration

Strongly Recommended: Use Miniconda to deploy MegaTTS3 on Windows systems to avoid unnecessary troubles. The following tutorials are based on Miniconda. Example Path: This tutorial assumes that your working directory (where you install MegaTTS3) is D:/python/megatts3. If your path is different, please modify the paths in the commands accordingly.

Step 1: Install Miniconda

  1. Download Miniconda:

    • Visit in your browser: https://www.anaconda.com/download/success#miniconda
    • Find the Miniconda Installers section on the page and click the download link. Click to Download in Miniconda Installers Section
  2. Install Miniconda:

    • Double-click the downloaded .exe installation file.
    • Click Next all the way, and click I Agree on the license agreement page. Click Next
    • Crucial Step: When selecting installation options, be sure to check the second checkbox "Add Miniconda3 to my PATH environment variable". Ignore the red warning text next to it, please be sure to check it. Select the First and Second Checkboxes
    • Continue clicking Next or Install until the installation is complete.

Step 2: Download MegaTTS3 Source Code

  1. Visit the Official Repository:

    • Open the website https://github.com/bytedance/MegaTTS3
  2. Download the Code:

    • Click the green <> Code button and then select Download ZIP.
    • Click Download ZIP to Download the Package
  3. Unzip and Place the Files:

    • Unzip the downloaded MegaTTS3-main.zip file.
    • Copy all files and subfolders inside the unzipped MegaTTS3-main folder to your prepared working directory, such as D:/python/megatts3. All Files in the Second Level of the Zip Package
    • After copying, the D:/python/megatts3 folder should contain folders like assets, checkpoints, and tts. Correctly Extracted and Copied File List

Step 3: Create and Activate a Virtual Environment

  1. Open CMD Console:

    • Go to your working directory D:/python/megatts3.
    • Enter cmd in the address bar and press Enter. Open CMD Console
  2. Create a Virtual Environment:

    • Enter the following command in the CMD console to create an environment named megatts3env that uses Python 3.10:
bash
conda create -n megatts3env python=3.10

Execute Command to Create Virtual Environment During the installation process, if prompted with Proceed ([y]/n)?, enter y and press Enter. Enter y, Press Enter

  1. Activate the Virtual Environment:
    • After creation, enter the following command to activate the environment (you need to execute this step to activate the virtual environment every time you run MegaTTS3 in the future):
bash
conda activate megatts3env

Activate Environment

After successful activation, (megatts3env) will be displayed before the command prompt.

After Activation, There Will Be (megatts3env) Characters at the Beginning

Note: All the following installation and running commands must be executed in the CMD console with the (megatts3env) environment activated!

Step 4: Install Dependencies

Special Note: Installing directly according to the official repository documentation on Windows usually fails. Be sure to strictly follow the order below to execute the commands.

  1. Install pynini:

    • Enter and execute in the activated CMD console:
      bash
      conda install -y -c conda-forge pynini==2.1.5
    • Wait for the command to complete execution.
  2. Install WeTextProcessing 1.0.3:

    • Continue to enter and execute in the CMD console:
      bash
      pip install WeTextProcessing==1.0.3
    • Wait for the command to complete execution.
  3. Modify requirements.txt and Install Remaining Dependencies:

    • Open the requirements.txt file in your working directory (D:/python/megatts3) using Notepad or another text editor.
    • Find and delete the line containing WeTextProcessing==1.0.4.1.
    • Save and close the file.
    • Return to the CMD console and execute the following command to install the remaining dependencies:
      bash
      pip install -r requirements.txt

This Line Must Be Deleted, Otherwise an Error Is Guaranteed

  1. Set Environment Variables:
    • Copy and paste the following command completely into the CMD console, and then press Enter to execute. Note: If your installation directory is not D:/python/megatts3, please modify the path in the command to your actual path.
      bash
      conda env config vars set PYTHONPATH="D:/python/megatts3;%PYTHONPATH%"
    • After successful setting, you need to close the current CMD window, then reopen a new CMD window, and reactivate the environment conda activate megatts3env, so that the environment variables will take effect.

Check: If all the above steps have no errors (ignore some yellow warning messages WARN), then the dependency environment is installed successfully. If you encounter red errors, please carefully check whether you have strictly followed the order of execution, especially whether you have correctly modified the requirements.txt file.

Installation Complete

Step 5: Download Pre-trained Models

Hint: The model files are hosted on Hugging Face Hub, which cannot be accessed domestically, and you must use a VPN.

  • Again, ensure that your CMD console is in the activated (megatts3env) state.
  • Execute the following command to download the model files to the checkpoints folder in the working directory:
    bash
    huggingface-cli download ByteDance/MegaTTS3 --local-dir ./checkpoints --local-dir-use-symlinks False
  • Wait patiently for the download to complete. Model Downloading

Step 6: (Optional) Add GPU Acceleration Support

If your computer is equipped with an NVIDIA graphics card and has installed CUDA 12.x, you can accelerate speech synthesis by installing the GPU version.

  • Make sure the CMD console has activated (megatts3env).
  • Execute the following command:
bash
    pip install --force-reinstall torch torchvision torchaudio --index-url https://download.pytorch.org/whl/cu126

At this point, all installation and configuration work is complete!


Start MegaTTS3 Web Service

Every time you want to use MegaTTS3, you need to start it by following these steps.

  1. Open CMD Console:

    • Go to your MegaTTS3 working directory (e.g., D:/python/megatts3).
    • Enter cmd in the address bar and press Enter.
  2. Activate the Virtual Environment:

    • Execute the command: conda activate megatts3envActivate the Environment First and Then Start
  3. (Recommended) Modify Gradio Listening Address:

    • Strongly recommended to perform this step before the first startup: Open the file D:\python\megatts3\tts\gradio_api.py with a code editor or Notepad.
    • Scroll to the end of the file and find server_name="0.0.0.0" and change it to server_name="127.0.0.1".
    • Reason: Using 0.0.0.0 on Windows may cause a large number of irrelevant error messages or even startup failure. Changing to 127.0.0.1 is usually more stable.
    • Save the file after modification.

Modify 0.0.0.0 to 127.0.0.1

Correctly Modified

  1. Start the Program:
    • Execute in the activated CMD console:
      bash
      python tts/gradio_api.py
  • If the startup is successful, you will see output similar to the following in the CMD console, indicating that the service is running: Screen After Successful Startup
  1. Access the Web Interface:

    • Open this address in your browser: http://127.0.0.1:7929. Open in Browser

Use MegaTTS3 for Voice Cloning

Understanding the Source of Voices

MegaTTS3 is currently a "semi-open source" project. This means that you cannot clone any voice samples you provide. You can only use the voices (latents) that ByteDance officially pre-processed and published on a specific page.

  • Official Explanation: This is done for security and legal compliance reasons.
  • If You Want to Clone Your Own Voice: You need to submit your audio according to the official designated method, wait for their review and approval, and then download it from the Latents page for use. (See below for specific methods)

Download Available Voice Files

  1. Visit the Google Drive Folder:

    • You need to use a VPN to access Google services and have a Google account (you can register for free if you don't have one).
    • Open the website (i.e., latents page): https://drive.google.com/drive/folders/1QhcHWcy20JfqWjgqZX1YM3I6i9u4oNlr
    • Here are three subfolders (librispeech_testclean_40, official_test_case, user_batch_1-3) containing all currently available voices.
  2. Select and Download Files:

    • Enter any folder, browse the .wav audio files inside, listen to them, and select the voice you want to clone (right-click on the wav file - Open with - Preview, to listen). Enter Folder to Select the Voice You Want to CloneRight-Click on Wav File - Open With - Preview, to Listen
    • Important: When you decide to download a .wav file (e.g., speaker_xxx.wav), you must also download the .npy file with the same name (i.e., speaker_xxx.npy). These two files are used in pairs and are indispensable. After Downloading a Wav, You Must Also Download the Npy File with the Same Name
    • Save the downloaded .wav and .npy files on your computer.

Synthesize Speech in the Web Interface

  1. Open the Web Interface:

    • Make sure the MegaTTS3 service is running and open http://127.0.0.1:7929 in your browser.
  2. Upload Voice Files:

    • Find the upload area on the page.
    • Click the "Upload.wav" area and select the .wav file you just downloaded.
    • Click the "Upload.npy" area and select the .npy file with the same name as the .wav file.Web Interface Usage
  3. Enter Text and Synthesize:

    • In the "Input Text" input box, enter the Chinese or English text you want the voice to read.
    • Click the "Submit" button to execute.
  4. Get Results:

    • Wait a short time, the synthesis process will be performed in the background.
    • Once completed, you can directly play the generated voice in the upper right corner, or find the download button to save it as an audio file.

Now you have successfully installed and used MegaTTS3 for voice cloning on Windows!

Upload the Voice You Want to Clone

If the voice you want to clone doesn't exist, you can upload it yourself

  1. First, convert the audio file of the voice you want to clone to the wav format, and the duration should not exceed 24 seconds, it is recommended to be within 5-24 seconds
  2. It must be ensured that the audio content is legal, does not infringe copyright, and has no background noise, clear pronunciation, and one speaker
  3. Open this website https://drive.google.com/drive/folders/1gCWL1y_2xu9nIFhUX_OW5MbcFuB7J5Cl , drag the wav file you have organized into it, and then wait for the review to pass before you can use it.

Drag and Upload

After ByteDance approves, it will create an npy file with the same name, and then put the wav and npy files into the user_batch_1-3 folder of the latens page mentioned above, and then you can download this wav file and the npy file with the same name to clone.