Skip to content

CosyVoice2-TTS Windows One-Click All-in-One Package: Easy AI Voice Synthesis for Beginners

Are you amazed by Alibaba's open-source CosyVoice2 AI voice synthesis technology but put off by the complex and error-prone installation process?

Don't worry, this one-click all-in-one package is made just for you!

With it, you don't need to install Python or struggle with various complicated errors. Just follow a few simple steps on Windows 10 or Windows 11 to easily experience top-tier AI voice synthesis technology.

A Quick Look at the Power of CosyVoice2

CosyVoice2 is a very powerful multilingual speech synthesis model that can generate highly accurate, stable, and natural-sounding speech.

  • Supports Multiple Languages: Includes Chinese, English, Japanese, Korean, and even various Chinese dialects like Cantonese, Sichuanese, and Shanghainese.
  • Cross-Lingual Voice Cloning: You can use a Chinese voice sample to generate authentic English speech, and vice versa.
  • Ultra-Low Latency: Extremely fast response, with generated audio available in as little as 150 milliseconds.
  • More Accurate Pronunciation: Compared to the previous generation, the error rate is reduced by 30%-50%, with very standard pronunciation.
  • Super Stable Timbre: Maintains voice consistency and stability no matter how it's used.
  • Emotion and Accent Control: Supports finer-grained control over emotion and accent adjustments, making the voice more expressive.

🚀 Start Your AI Voice Journey in Just Three Steps

Step 1: Download the Package

First, you need to download the package file named cosyvoice2-win.7z. We provide two download channels; choose the faster one for you:

Important Note: This is a .7z format compressed file. If your computer cannot open it directly or you get an error during extraction, we recommend installing free and powerful extraction software like 360 Compression or Bandizip and then try again.

Step 2: Extract the Files

After downloading, find the compressed file. Right-click on it and select "Extract Here" or "Extract to cosyvoice2". After extraction, you will get a new folder with the same name.

Step 3: Double-Click to Launch!

Open the newly extracted folder and find a file named 双击启动.bat (Double-Click to Start.bat).

Simply double-click it with your mouse, and the program will start running!

What Happens After Double-Clicking?

A black window (called a "Command Prompt") will pop up. Please do not close this window; the program is handling everything for you in the background:

  1. Automatically Downloads Model Files: The program first checks if the necessary AI model files (several gigabytes in size) are complete. If files are missing, it automatically starts downloading them. You will see the download progress in the window. This process takes some time depending on your internet speed, so please be patient.

Network Tip: If the download fails midway and you want to restart it, first go into the pretrained_models folder, delete the incomplete model folders inside, and then re-run "双击启动.bat".

  1. Starts Core Services: Once the models are ready, the program automatically starts the WebUI service. This is the interface you use for voice synthesis.

  2. See the Success Message: Continue waiting until you see a message similar to the following in the black window, which means you've succeeded!

    Running on local URL:  http://127.0.0.1:8000
    
    To create a public link, set `share=True` in `launch()`.

This means CosyVoice2 is now successfully running on your computer!


💻 Start Your AI Voice Creation

Keep that black window open, then open your browser (Chrome or Edge is recommended) and type the following into the address bar at the top:

http://127.0.0.1:8000

Press Enter, and you will see the simple yet powerful operation interface. Now you can freely explore, input text, upload voice samples, and generate unique AI voices!

How to Close the Program? Very simple: when you're done, just close the black window that has been open.


🔧 Advanced Usage: Switching Different Voice Models

This package comes with multiple built-in models, each with different characteristics. The default startup is the most comprehensive CosyVoice2-0.5B model. If you have specific needs, you can switch manually.

  • CosyVoice-300M-SFT: Must use this if you want to use the built-in variety of preset voice styles.
  • CosyVoice-300M-Instruct: Must use this if you want to control the voice via text descriptions (e.g., "speak in a gentle tone").
  • CosyVoice2-0.5B: The latest and most powerful model with the best overall performance (default option).
  • CosyVoice-300M: A basic model.

Switching Method:

  1. Find the 双击启动.bat file in the folder, right-click on it, and select "Edit". (If you don't see "Edit," choose "Open with" -> "Notepad")
  2. You will see the following lines of code:
    batch
    call %cd%/pybin/python webui.py --model_dir pretrained_models/CosyVoice2-0.5B
    rem call %cd%/pybin/python webui.py --model_dir pretrained_models/CosyVoice-300M
    rem call %cd%/pybin/python webui.py --model_dir pretrained_models/CosyVoice-300M-Instruct
    rem call %cd%/pybin/python webui.py --model_dir pretrained_models/CosyVoice-300M-SFT
  3. rem here means "remark" or comment, indicating that line of code is temporarily inactive.
    • To disable the current model: Add rem (rem followed by a space) to the very beginning of its line.
    • To enable the target model: Remove the rem from the very beginning of the target model's line.
  4. After making changes, save and close Notepad, then re-run "双击启动.bat" (you must first close any already running bat window).

For example, to switch to the CosyVoice-300M-SFT model, you would modify it to look like this:

batch
rem call %cd%/pybin/python webui.py --model_dir pretrained_models/CosyVoice2-0.5B
rem call %cd%/pybin/python webui.py --model_dir pretrained_models/CosyVoice-300M
rem call %cd%/pybin/python webui.py --model_dir pretrained_models/CosyVoice-300M-Instruct
call %cd%/pybin/python webui.py --model_dir pretrained_models/CosyVoice-300M-SFT

❓ Frequently Asked Questions

  1. The program crashes on startup, or the black window shows an error like ValueError: When localhost is not accessible... What should I do?

Solution: This is usually because your computer has a network proxy or VPN software running (like some game accelerators). They occupy the local network port that the program needs. Please close your VPN or network proxy software, then double-click to start the program again.

  1. When double-clicking run-api.bat to run the API, I get the error CosyVoice.__init__() got an unexpected keyword argument 'load_onnx'?

Solution: Open the api.py file (in an editor or Notepad), search for load_jit=True, load_onnx=False, find it and delete it. Then search for load_jit=True, load_onnx=False, load_trt=False, find it and delete it. This code appears in two places.

For Advanced Users: API Integration

The package also includes a run-api.bat file. If you are a developer and want to integrate CosyVoice2's voice synthesis capabilities into other programs (like pyVideoTrans), you can double-click this file to start the API service.