Skip to content

F5-TTS is an open-source voice cloning tool developed by Shanghai Jiao Tong University, delivering outstanding results. The initial version only supported Chinese and English cloning, but the latest v1 release has expanded support to include French, Italian, Hindi, Japanese, Russian, Spanish, Finnish, and many other languages.

This article mainly explains how to install and launch F5-TTS using the official source code and how to integrate it with the pyVideotrans project. Additionally, it covers how to enable calls within a local area network (LAN) by modifying the source code.

Due to limited time and resources, I will no longer maintain my previous personal integrated packages and API interfaces. Instead, I will uniformly use the official interface for integration with the pyVideotrans project. The limitation of the official interface is that it can only be called locally and cannot be accessed over a LAN. For the solution, please refer to the LAN usage section in this article.

Prerequisites

Your system must have Python 3.10 installed. While versions 3.11/3.12 might theoretically work, they have not been tested, so it is recommended to use version 3.10.

If Python is not yet installed:

Check if Python is installed:

  • Windows: Press Win+R, type cmd in the pop-up window, and press Enter. In the opened black window, type python --version. If it shows 3.10.xx, it is installed; if it prompts "python is not recognized as an internal or external command," it is either not installed or Python is not added to the Path environment variable, requiring reinstallation.
  • Mac: In the terminal, directly execute python3 --version. If it outputs 3.10.x, it is installed; otherwise, installation is needed.

Download F5-TTS Source Code

First, create an empty folder in a suitable location. It is recommended to choose a non-system drive or a location that does not require special permissions, such as the D: drive. Avoid placing it in directories like C:/Program Files (it is suggested that the location and all folder names consist of pure numbers or letters) to prevent potential issues. For example, D:/f5/v1 is a good location, while D:/Open Source f5/f5 v1 with spaces and Chinese characters is not recommended.

This article uses installing F5-TTS in the D:/python/f5ttsnew folder on a Windows 10 system as an example.

Open the URL: https://github.com/SWivid/F5-TTS

As shown in the figure below, click to download the source code:

Download source code zip package

After downloading, extract the zip file and copy all files from the F5-TTS-main folder into the D:/python/f5ttsnew folder, as shown below:

Inside the F5-TTS-main folder in the zip package

Copy to f5ttsnew

Create a Virtual Environment

It is strongly recommended to create a virtual environment unless your computer has no other Python or AI projects. A virtual environment can effectively avoid many potential errors.

In the address bar of the newly created folder D:/python/f5ttsnew, type cmd and press Enter (Mac users should use the terminal to navigate to this folder).

Execute the following command to create a virtual environment: python -m venv venv. After execution, a new folder named venv will appear inside the directory.

Next, activate the virtual environment (note the spaces and dot):

  • Windows: .\venv\scripts\activate
  • Mac: source ./venv/bin/activate

After the virtual environment is activated, (venv) will appear before the command prompt. Ensure all subsequent operations are performed within this virtual environment, and always check that (venv) is present before the command prompt each time.

Command prompt with (venv) indicates activation

Install Dependencies

In the terminal with the virtual environment activated, continue by entering the following command (note the spaces and dot):

pip install -e .

Wait for the installation to complete. If CUDA acceleration is needed, continue with the following command (this is a single command, do not break the line):

# Install pytorch with your CUDA version, e.g.
pip install torch==2.4.0+cu124 torchaudio==2.4.0+cu124 --extra-index-url https://download.pytorch.org/whl/cu124

Configure Proxy for Internet Access

Important Note: F5-TTS needs to download models online from the huggingface.co website. Since this website is blocked in some regions and cannot be accessed directly, you must configure a proxy and enable global or system proxy before launching.

If your VPN tool provides an HTTP port (as shown below):

Check if the proxy software provides a port

Set the proxy in the terminal by entering the following command:

  • Windows: set https_proxy=http://127.0.0.1:10808 (Replace the port number with the one you actually use)
  • Mac: export https_proxy=http://127.0.0.1:10808 (Replace the port number with the one you actually use)

You can also directly modify the code to set the proxy, avoiding the need to manually enter it in the terminal each time. Open the file F5-TTS root directory/src/f5_tts/infer/infer_gradio.py and add the following code at the top of the file:

python
import os
os.environ['https_proxy']='http://127.0.0.1:10808' # Fill in according to your actual proxy address

Launch the WebUI Interface

After configuring the proxy, enter the following command in the terminal to launch the WebUI:

f5-tts_infer-gradio

On the first launch, the program will automatically download the model, which may be slow; please wait patiently. On subsequent launches, the program may still connect to huggingface.co for checks; it is recommended to keep the proxy enabled to avoid errors.

After a successful launch, the terminal will display the IP address and port number, as shown below:

IP and port displayed indicates successful launch; first time is very slow

Open the displayed address in your browser, default is http://127.0.0.1:7860.

WebUI interface

Integrate with pyVideoTrans API

To use F5-TTS in the video translation software, you first need to start F5-TTS and keep the terminal window open.

Then, open the video translation software, navigate to the menu: "TTS Settings" -> "F5-TTS API", and fill in the F5-TTS launch address, default is http://127.0.0.1:7860. If your launch address is not the default, fill in the actual address.

In the "Reference Audio" field, enter the following:

Name of the audio file you want to use#The corresponding text in that audio file

Note: Place the reference audio file inside the f5-tts folder in the root directory of the pyVideotrans project. If the folder does not exist, create it manually. For example, you can name the reference audio file nverguo.wav.

Place reference audio in the f5-tts folder inside pyVideotrans software; do not mistake the location

Example of how to fill it in:

Reference audio and text inside reference audio

Re-recognize?: By default, the reference audio (the subtitle recognized during cloning) will be sent to F5-TTS together to avoid F5-TTS starting whisper for speech recognition, saving time and improving efficiency. However, sometimes you might want F5-TTS to re-recognize, which can improve cloning quality to some extent. In this case, check this checkbox. But note, if this is the first time doing so, F5-TTS will download the openai-whisper-v3 model online from huggingface.co; please ensure you have proxy configured.

Solve LAN Access Issue

If your F5-TTS is deployed on another computer within the local area network, you need to modify the F5-TTS code to support LAN access.

Open the file F5-TTS project directory/src/f5_tts/infer/infer_gradio.py, and add the following code below line 16:

python
# Added for LAN start
import os
from pathlib import Path

ROOT=Path(os.getcwd()).as_posix()
TMP=f'{ROOT}/tmp'
Path(TMP).mkdir(exist_ok=True)
os.environ['GRADIO_TEMP_DIR']=TMP
gr.set_static_paths(paths=[TMP,tempfile.gettempdir()])
print(TMP)

## Added for LAN end

Diagram showing where to add the code:Note the location to add the code

After saving the changes, restart F5-TTS. Then, in pyVideotrans, fill in the IP address and port number where F5-TTS is running, for example http://192.168.0.12:7860.

Add Other Languages

If you need to use models for other languages, you also need to modify the file F5-TTS project directory/src/f5_tts/infer/infer_gradio.py.

Find the code around line 59:

python
DEFAULT_TTS_MODEL_CFG = [
    "hf://SWivid/F5-TTS/F5TTS_v1_Base/model_1250000.safetensors",
    "hf://SWivid/F5-TTS/F5TTS_v1_Base/vocab.txt",
    json.dumps(dict(dim=1024, depth=22, heads=16, ff_mult=2, text_dim=512, conv_layers=4)),
]

Diagram showing the code location:

By default, this configures the official Chinese and English model. If you need to use models for other languages, modify it according to the instructions below. After modification, you need to restart F5-TTS and ensure the proxy is configured so the program can download the new language model online. After successful download, first test cloning a voice through the WebUI, then use it via pyVideoTrans.

Important: Before use, ensure that the dubbing text language in pyVideoTrans matches the model language selected in F5-TTS.

Here are the configuration details for each language model:

  1. French:

    python
    DEFAULT_TTS_MODEL_CFG = [
        "hf://RASPIAUDIO/F5-French-MixedSpeakers-reduced/model_last_reduced.pt",
        "hf://RASPIAUDIO/F5-French-MixedSpeakers-reduced/vocab.txt",
        json.dumps({"dim": 1024, "depth": 22, "heads": 16, "ff_mult": 2, "text_dim": 512, "text_mask_padding": False, "conv_layers": 4, "pe_attn_head": 1}),
    ]
  2. Hindi:

    python
    DEFAULT_TTS_MODEL_CFG = [
        "hf://SPRINGLab/F5-Hindi-24KHz/model_2500000.safetensors",
        "hf://SPRINGLab/F5-Hindi-24KHz/vocab.txt",
        json.dumps({"dim": 768, "depth": 18, "heads": 12, "ff_mult": 2, "text_dim": 512, "text_mask_padding": False, "conv_layers": 4, "pe_attn_head": 1})
    ]
  3. Italian:

    python
    DEFAULT_TTS_MODEL_CFG = [
        "hf://alien79/F5-TTS-italian/model_159600.safetensors",
        "hf://alien79/F5-TTS-italian/vocab.txt",
        json.dumps({"dim": 1024, "depth": 22, "heads": 16, "ff_mult": 2, "text_dim": 512, "text_mask_padding": False, "conv_layers": 4, "pe_attn_head": 1})
    ]
  4. Japanese:

    python
    DEFAULT_TTS_MODEL_CFG = [
        "hf://Jmica/F5TTS/JA_25498980/model_25498980.pt",
        "hf://Jmica/F5TTS/JA_25498980/vocab_updated.txt",
        json.dumps({"dim": 1024, "depth": 22, "heads": 16, "ff_mult": 2, "text_dim": 512, "text_mask_padding": False, "conv_layers": 4, "pe_attn_head": 1})
    ]
  5. Russian:

    python
    DEFAULT_TTS_MODEL_CFG = [
        "hf://hotstone228/F5-TTS-Russian/model_last.safetensors",
        "hf://hotstone228/F5-TTS-Russian/vocab.txt",
        json.dumps({"dim": 1024, "depth": 22, "heads": 16, "ff_mult": 2, "text_dim": 512, "text_mask_padding": False, "conv_layers": 4, "pe_attn_head": 1})
    ]
  6. Spanish:

    python
    DEFAULT_TTS_MODEL_CFG = [
        "hf://jpgallegoar/F5-Spanish/model_last.safetensors",
        "hf://jpgallegoar/F5-Spanish/vocab.txt",
        json.dumps({"dim": 1024, "depth": 22, "heads": 16, "ff_mult": 2, "text_dim": 512, "conv_layers": 4})
    ]
  7. Finnish:

    python
       DEFAULT_TTS_MODEL_CFG = [
        "hf://AsmoKoskinen/F5-TTS_Finnish_Model/model_common_voice_fi_vox_populi_fi_20241206.safetensors",
        "hf://AsmoKoskinen/F5-TTS_Finnish_Model/vocab.txt",
        json.dumps({"dim": 1024, "depth": 22, "heads": 16, "ff_mult": 2, "text_dim": 512, "text_mask_padding": False, "conv_layers": 4, "pe_attn_head": 1})]

You can follow official updates; other languages can be added similarly. Address: https://github.com/SWivid/F5-TTS/blob/main/src/f5_tts/infer/SHARED.md

Common Errors and Notes

  1. During API usage, you can close the WebUI interface in the browser, but do not close the terminal window where F5-TTS was started.

    This interface must not be closed, otherwise the API cannot be called

  2. Can I dynamically switch models in F5-TTS? No. You need to manually modify the code as described above and then restart the WebUI.

  3. Frequent occurrence of errors like this:

    raise ConnectTimeout(e, request=request)
requests.exceptions.ConnectTimeout: (MaxRetryError("HTTPSConnectionPool(host='huggingface.co', port=443): Max retries exceeded with url: /SWivid/F5-TTS/resolve/main/F5TTS_v1_Base/vocab.txt (Caused by ConnectTimeoutError(<urllib3.connection.HTTPSConnection object at 0x000002174796DF60>, 'Connection to huggingface.co timed out. (connect timeout=10)'))"), '(Request ID: 0458b571-90ab-4edd-ae59-b93bd603cdd0)')

Proxy issue. Please use a proxy and ensure it is stable. Refer to the section above on configuring proxy for internet access.

  1. How to disable connecting to huggingface.co every time?

Ensure you have successfully cloned at least once and the model is downloaded. Open F5-TTS root directory/src/f5_tts/infer/utils_infer.py

Search for snapshot_download, find the line of code as shown in the image.

Modify it to:

local_path = snapshot_download(repo_id="nvidia/bigvgan_v2_24khz_100band_256x", cache_dir=hf_cache_dir,local_files_only=True)

Then search for hf_hub_download, find the 2 lines of code as shown in the image.

Modify them to:

config_path = hf_hub_download(repo_id=repo_id, cache_dir=hf_cache_dir, filename="config.yaml",local_files_only=True)
            model_path = hf_hub_download(repo_id=repo_id, cache_dir=hf_cache_dir, filename="pytorch_model.bin",local_files_only=True)

Essentially, a new parameter ,local_files_only=True is added to these 3 function calls. Ensure the model is already downloaded locally, otherwise, a model not found error will occur.

  1. F5-TTS is deployed normally, but testing in pyVideotrans returns {detail:"Not found"}
  • Check if other AI projects are occupying the port. Generally, AI projects with interfaces often use gradio and default to port 7860. Close others and restart F5-TTS.
  • If pyVideotrans is deployed from source, execute pip install --upgrade gradio_client and try again.
  • Restart F5-TTS using the command f5-tts_infer-gradio --api to launch.