Creating a Ready-to-Use Windows Package for Index-TTS- A Guide to Solving Environment Isolation and Dependency Challenges | pyVideoTrans-Open Source Video Translation Tool -pyvideotrans.com github.com/jianchang512/pyvideotrans

Creating a Ready-to-Use Windows Package for Index-TTS: A Guide to Solving Environment Isolation and Dependency Challenges

WebUI Interface

Index-TTS is an excellent open-source zero-shot text-to-speech (TTS) project. It particularly excels in Chinese language processing, effectively correcting the pronunciation of polyphonic characters, and delivering outstanding audio quality and voice similarity. For users who wish to integrate high-quality speech capabilities into their applications or simply want to experience cutting-edge AI technology, Index-TTS is undoubtedly a treasure.

However, for many Windows users unfamiliar with Python and complex compilation environments, running such a project is no easy feat. From configuring the environment and installing numerous dependencies to handling special libraries that are difficult to install directly on Windows, each step can be a daunting barrier. To allow more people to easily experience the magic of Index-TTS, I decided to create a "one-click-to-run" integrated package. This article documents the challenges encountered during the creation process, the problem-solving approaches, and some noteworthy details, hoping to provide a reference for others with similar needs.

The Foundation: Choosing a Portable Python Environment

The primary goal of creating an integrated package is to make it "out-of-the-box," meaning it shouldn't require users to pre-install a specific version of Python or configure complex environment variables. It needs to be self-contained and "green."

The key to achieving this is using the official Windows embeddable package provided by Python.

Unlike the standard Python installer, the embeddable package is a streamlined ZIP file. Once unzipped, it contains a minimal Python runtime environment without the complex package management tools and documentation. Its advantages are clear:

Environment Isolation: It won't conflict with other Python versions that may exist on the user's system.
Installation-Free: It doesn't require administrator privileges. Just unzip and use. It can be placed in any directory, even on a USB drive.
Easy Distribution: The entire application and its Python environment can be bundled together for convenient distribution.

Download the Python 3.10 embeddable package here: https://www.python.org/downloads/release/python-31011/ > Choose the Windows embeddable package (64-bit) version.

I chose the 64-bit embeddable package of Python 3.10.11 for this project to ensure compatibility with the dependency versions required by Index-TTS. I unzipped it into a runtime folder within the project's root directory, for example, D:/index-tts/runtime.

Getting this embeddable package is just the first step; by default, it doesn't even include pip. We need to manually enable its package management capabilities. First, download the get-pip.py script. Then, navigate to the runtime folder, paste get-pip.py there, open a command prompt, and execute python.exe get-pip.py. This will install the pip module into the runtime\Lib\site-packages directory.

Download get-pip.py from this address: https://bootstrap.pypa.io/get-pip.py

The next step is crucial: modifying the runtime\python310._pth file. This file is the path configuration for the embeddable environment, telling the Python interpreter where to find modules. By default, its contents are very limited, preventing it from recognizing newly installed libraries. To ensure the site-packages directory is loaded correctly and that project source code is recognized, we need to add its path. Open the file with a text editor, delete the default content, and replace it with the following:

python310.zip
.
./Lib/site-packages
./Scripts
../
../indextts

With these steps, a self-contained, portable, and fully functional Python environment located at D:/index-tts/runtime is ready.

The Core Challenge: Conquering the Installation of pynini and WeTextProcessing

While preparing to install the dependencies for Index-TTS, we encountered the biggest obstacle in the entire packaging process: the pynini and WeTextProcessing libraries.

pynini is a powerful tool for compiling, optimizing, and applying grammar rules, built on top of OpenFst. In speech and language processing, it's often used for low-level tasks like text normalization and grammar parsing. WeTextProcessing is a toolkit focused on Chinese text normalization and inverse normalization, and it also relies heavily on pynini.

The official pynini documentation also clearly states that it is not designed or tested for Windows. A direct attempt to pip install pynini often triggers a lengthy compilation process that ultimately ends in failure.

The reason for failure is typical: these libraries contain a large amount of C++ source code that needs to be compiled locally into dynamic-link libraries (.pyd files) that Python can call. This process depends on a specific C++ compiler environment and a series of complex library files (like OpenFst). For an average user's computer, these prerequisites are usually not met, leading to compilation errors like the one below:

error: subprocess-exited-with-error

× Building wheel for pynini (pyproject.toml) did not run successfully.
│ exit code: 1
╰─> [104 lines of output]
    ...
    error: command 'C:\\Program Files\\Microsoft Visual Studio\\...\\cl.exe' failed with exit code 2
    [end of output]

note: This error originates from a subprocess, and is likely not a problem with pip.
ERROR: Failed building wheel for pynini

This error can still occur even after installing Visual Studio and its build tools, including cl.exe.

The workaround is to pre-compile them on a system that already has the proper build environment, and then copy the resulting compiled files directly into our integrated package.

Miniconda became the perfect tool to achieve this goal. It can easily create isolated environments on Windows and install pynini through its powerful package manager, which downloads pre-compiled binaries from the conda-forge channel instead of compiling from source locally.

The specific steps are as follows:

Create a Conda Environment: Create an independent conda environment, specifying Python version 3.10 to match our embeddable package.
bash
```
conda create -n index-tts python=3.10
conda activate index-tts
```
1
2
Install within Conda: Use the conda-forge channel to install pynini, and then use pip to install WeTextProcessing.
bash
```
conda install -c conda-forge pynini==2.1.6
pip install WeTextProcessing --no-deps
```
1
2
Transplant the Files: This is the core of the entire solution. We need to find all the relevant files for these two libraries in the conda environment and copy them to the corresponding location in our portable Python environment. Pay close attention to the paths to avoid confusion.
- Library Files: Navigate to the envs\index-tts\Lib\site-packages directory in your conda environment. Copy the pynini, WeTextProcessing-1.0.4.1.dist-info, tn, and pywrapfst folders, as well as the crucial compiled artifacts _pynini.cp310-win_amd64.pyd and _pywrapfst.cp310-win_amd64.pyd, into our package's D:/index-tts/runtime/Lib/site-packages directory.
- Dynamic-Link Libraries (DLLs): The underlying OpenFst libraries that pynini depends on exist as DLL files. These are located in the conda environment's \envs\index-tts\Library\bin directory. You need to copy all DLL files starting with fst to the root of our package's D:/index-tts/runtime directory, so the Python interpreter can find them at startup.

After completing this "transplanting" work, we successfully bypassed the compilation problem on the end-user's machine. At the same time, to prevent pip from trying to compile them again when installing other dependencies, it is essential to open the requirements.txt file located in D:/index-tts and delete the two lines related to WeTextProcessing.

Automation and User Experience: The Magic of Scripts

With the environment and core dependency issues resolved, the remaining task was to figure out how to let users start the program in the simplest way possible and how to handle the download of model files.

Model Download Script: `downmodel.py`

The model files for Index-TTS are quite large. Bundling all of them into the package would make the entire software huge and difficult to distribute. A better approach is to have the program automatically download the models on its first run. For this, I wrote the downmodel.py script, which offers several clear benefits:

Reduces Package Size: Users download a lightweight launcher, and models are downloaded on demand.
Solves Network Access Issues: As is well known, direct access to the Hugging Face Hub can be difficult in certain regions. The script sets the environment variable HF_ENDPOINT=https://hf-mirror.com, pointing the download to a mirror site, which greatly improves download stability and speed.
Intelligent Checking: The script checks if model files already exist in the local checkpoints directory. If they do, it skips the download to avoid redundant operations. After downloading, it also verifies that the files exist and are not zero-sized to ensure model integrity.

A Hidden Pitfall: The Outdated `config.yaml`

During debugging, I discovered a subtle but critical issue. The checkpoints folder in the Index-TTS GitHub repository might contain a default config.yaml file. This configuration file could be incompatible with the latest model version (e.g., version 1.5). If this old file is kept, and the download script skips downloading the new config.yaml because it detects an existing file, the program will crash on startup due to a model layer dimension mismatch.

To circumvent this issue, the download logic in the downmodel.py script needed to be more refined. I added a check: even if config.yaml exists, if a core model file like bigvgan_discriminator.pth does not exist (implying a first-time download), config.yaml must still be forcibly re-downloaded and overwritten to ensure the configuration matches the model version.

Here is the complete code for downmodel.py, which implements the logic described above:

python

import json
import os
import sys
import time
from pathlib import Path
from huggingface_hub import hf_hub_download

def download_model_files():
    """
    Downloads the necessary files for the IndexTTS-1.5 model from Hugging Face Hub.
    """
    repo_id = "IndexTeam/IndexTTS-1.5"
    local_dir = "checkpoints"
    
    # Ensure the local directory exists
    if not os.path.exists(local_dir):
        print(f"Creating directory: {local_dir}")
        os.makedirs(local_dir)

    # List of files to download
    files_to_download = [
        "config.yaml",
        "bigvgan_discriminator.pth",
        "bigvgan_generator.pth",
        "bpe.model",
        "dvae.pth",
        "gpt.pth",
        "unigram_12000.vocab"
    ]
    
    is_bigvgan_discriminator = Path(f'./{local_dir}/bigvgan_discriminator.pth').exists()
    for filename in files_to_download:
        # Check if the file already exists; if so, skip the download
        is_exists = Path(f'{local_dir}/{filename}').exists()
        if is_exists:
            # If config.yaml exists but bigvgan_discriminator.pth does not, config.yaml needs to be re-downloaded
            # Otherwise, skip
            if filename != 'config.yaml' or is_bigvgan_discriminator:
                print(f"File {filename} already exists, skipping download.")
                continue
            
        print(f"Downloading {filename} to {local_dir}...")
        try:
            # Use hf_hub_download to download the file
            hf_hub_download(
                repo_id=repo_id,
                filename=filename,
                local_dir=local_dir,
                # resume_download=True  # If needed, you can enable resumable downloads
            )
            print(f"Download of {filename} complete.")
        except Exception as e:
            print(f"Failed to download {filename}: {e}")
            # You can decide here whether to continue downloading other files or stop the program
            # return False # If you want to stop on download failure, you can uncomment this line
    
    for filename in files_to_download:
        # Check if the file exists and is not empty
        local_file_path = Path(f'./{local_dir}/{filename}')
        if not local_file_path.exists() or local_file_path.stat().st_size == 0:
            print(f"File {filename} does not exist or has a size of 0. Please ensure your network connection is stable, then delete the file and restart to download again.")
            return False
    
    print("All model file download checks complete!\n")
    return True

os.environ['HF_HUB_DISABLE_SYMLINKS_WARNING'] = 'true'
os.environ['HF_ENDPOINT'] = 'https://hf-mirror.com'


print("\n----Checking if IndexTTS-1.5 models are downloaded...")
download_success = download_model_files()

if not download_success:
    print("\n\n############Model file download failed. Please check your network connection or download manually. The program will exit shortly.\n")
    time.sleep(5)
    sys.exit() 

# After downloads are complete, proceed to start the WebUI
print("\nModel files are ready, starting WebUI...")
print("\n\n********Please wait for startup to complete. When you see \"Running on local URL: http://127.0.0.1:7860\", open this address in your browser.********\n\n")

One-Click Start Script: `Double-Click to Start.bat`

Finally, to achieve a true "double-click-to-run" experience, a simple batch script is essential. The content of 双击启动.bat (Double-Click to Start.bat) is brief but completes all the necessary preparations:

batch

@echo off
rem Set the current code page to UTF-8 to display non-ASCII characters correctly.
chcp 65001 > nul

TITLE Index-TTS Windows Package - Created by pvt9.com

set HF_HUB_DISABLE_SYMLINKS_WARNING=true
set HF_ENDPOINT=https://hf-mirror.com
set ROOT_DIR=%~dp0
set ROOT_DIR=%ROOT_DIR:~0,-1%
set PATH=%ROOT_DIR%;%ROOT_DIR%\ffmpeg;%PATH%

call %cd%/runtime/python downmodel.py

call %cd%/runtime/python webui.py

pause

It first sets the window title and UTF-8 code page to correctly display characters. Then, it sets the necessary environment variables (like the mirror address) and temporarily adds the ffmpeg tool's path to the system PATH. After that, it sequentially calls downmodel.py to check for and download models, and finally executes webui.py to launch the Gradio interface. When everything is ready, the user will see the familiar Running on local URL: http://127.0.0.1:7860 prompt in the command window. At that point, they can open the address in their browser to begin.

Through this series of operations, a Python project that originally required tedious configuration was ultimately packaged into an extremely user-friendly integrated package for the average Windows user.

Just a double-click is all it takes to enjoy the convenience brought by AI technology.