Skip to content

F5-TTS, an open-source voice cloning tool from Shanghai Jiao Tong University, delivers exceptional results. While the initial version only supported Chinese and English cloning, the latest v1 release expands to include French, Italian, Hindi, Japanese, Russian, Spanish, Finnish, and more.

This article focuses on how to install and launch F5-TTS using the official source code, and how to integrate it with the pyVideotrans project. Additionally, it will cover how to modify the source code to enable local network (LAN) access.

Due to limited resources, I will no longer maintain the previous personal integration package and API interface. Instead, I will uniformly use the official interface for integration with the pyVideotrans project. The limitation of the official interface is that it can only be called locally and not within a LAN. Refer to the LAN Usage section of this article for a solution.

Prerequisites

Your system must have Python version 3.10 installed. Although versions 3.11/3.12 may theoretically work, they have not been tested, so it is recommended to use version 3.10.

If Python is not yet installed:

Check if Python is installed:

  • Windows: Press Win+R, type cmd in the pop-up window and press Enter. In the opened black window, type python --version. If 3.10.xx is displayed, it is installed; if it says "python is not an internal or external command", it means it is not installed or Python is not added to the Path environment variable, and you need to reinstall.
  • Mac: Execute python3 --version directly in the terminal. If 3.10.x is output, it means it is installed; otherwise, it needs to be installed.

Download F5-TTS Source Code

First, create an empty folder in a suitable location. It is recommended to choose a non-system drive, a location that does not require special permissions, such as drive D, etc. Avoid placing it in directories like C:/Program Files (it is recommended that the location and each level of folder use names consisting of pure numbers or letters) to avoid potential problems. For example, D:/f5/v1 is a good location, while D:/开源 f5/f5 v1 with spaces and Chinese characters is not recommended.

This article uses installing F5-TTS in the Windows10 system's D:/python/f5ttsnew folder as an example.

Open the website: https://github.com/SWivid/F5-TTS

As shown in the figure below, click to download the source code:

Download source code zip package

After downloading, extract the compressed package, copy all the files in the F5-TTS-main folder to the D:/python/f5ttsnew folder, as shown below:

Inside the F5-TTS-main folder in the compressed package

Copy to f5ttsnew

Create a Virtual Environment

Creating a virtual environment is highly recommended unless you have no other Python or AI projects on your computer. Virtual environments can effectively avoid many potential errors.

In the address bar of the newly created folder D:/python/f5ttsnew, type cmd and press Enter (Mac users, please use the terminal to enter the folder).

Execute the following command to create a virtual environment: python -m venv venv. After execution, a folder named venv will be added to the folder.

Next, activate the virtual environment (note the spaces and dot symbols):

  • Windows: .\venv\scripts\activate
  • Mac: . ./venv/bin/activate

After the virtual environment is activated, the command prompt will add the word (venv). Please ensure that all subsequent operations are performed in this virtual environment, and check whether the command prompt has (venv) before each operation.

The command line has (venv) to indicate activation

Install Dependencies

In the terminal with the virtual environment activated, continue to enter the following command (note the spaces and dot symbols):

pip install -e .

Wait for the installation to complete. If CUDA acceleration is required, continue to execute the following command (this is one line of command, do not wrap it):

# Install pytorch with your CUDA version, e.g.
pip install torch==2.4.0+cu124 torchaudio==2.4.0+cu124 --extra-index-url https://download.pytorch.org/whl/cu124

Configure Scientific Internet Access

Important Note: F5-TTS needs to download models online from the huggingface.co website. Since this website is blocked in some regions and cannot be connected to directly, you must configure a scientific internet access environment and enable global or system proxy before starting.

If the VPN tool you are using provides an HTTP port (as shown in the figure below):

Check if the scientific software provides a port

Please enter the following command in the terminal to set the proxy:

  • Windows: set https_proxy=http://127.0.0.1:10808 (Please replace the port number with the actual port you are using)
  • Mac: https_proxy=http://127.0.0.1:10808 (Please replace the port number with the actual port you are using)

You can also directly modify the code to set the proxy, avoiding manual entry in the terminal each time. Open the F5-TTS root directory/src/f5_tts/infer/infer_gradio.py file and add the following code at the top of the file:

python
import os
os.environ['https_proxy']='http://127.0.0.1:10808' # Fill in according to your actual proxy address

Start the WebUI Interface

After configuring the scientific internet access environment, enter the following command in the terminal to start the WebUI:

f5-tts_infer-gradio

The first time it is started, the program will automatically download the model, which may be slow, so please be patient. When starting up later, the program may still connect to huggingface.co for detection. It is recommended to keep the proxy enabled to avoid errors.

After a successful startup, the terminal will display the IP address and port number, as shown in the figure below:

Successful startup when IP and port are displayed, the first time is very slow

Open the displayed address in your browser, which defaults to http://127.0.0.1:7860.

webui interface

Integrate the pyVideoTrans API

To use F5-TTS in the video translation software, you need to start F5-TTS first and keep the terminal window open.

Then, open the video translation software, and in the menu, select "TTS Settings" -> "F5-TTS API", and fill in the startup address of F5-TTS, which defaults to http://127.0.0.1:7860. If your startup address is not the default address, please fill it in according to the actual address.

In the "Reference Audio" column, fill in the following:

Audio filename to be used#Corresponding text in the audio file

Note: Please place the reference audio file in the f5-tts folder under the pyVideotrans project root directory. If the folder does not exist, create it manually. For example, you can name the reference audio file nverguo.wav.

Put the reference audio in the f5-tts folder in the pyVideotrans software, don't make a mistake

The following is an example:

Reference audio and text in reference audio

Re-recognize?: By default, the reference audio (subtitles recognized during cloning) will be sent to F5-TTS to avoid F5-TTS starting whisper for speech recognition, saving time and improving efficiency, but sometimes you may want F5-TTS to re-recognize, which can improve the cloning quality to a certain extent. You can select this check box at this time, but please note that if this is the first time you do so, F5-TTS will download the openai-whisper-v3 model online from huggingface.co, so please ensure that you have scientific internet access.

Solve LAN issues

If your F5-TTS is deployed on another computer on the LAN, you need to modify the F5-TTS code to support LAN access.

Open the F5-TTS project directory/src/f5_tts/infer/infer_gradio.py file and add the following code below line 16:

python
# Add LAN start
import os
from pathlib import Path

ROOT=Path(os.getcwd()).as_posix()
TMP=f'{ROOT}/tmp'
Path(TMP).mkdir(exist_ok=True)
os.environ['GRADIO_TEMP_DIR']=TMP
gr.set_static_paths(paths=[TMP,tempfile.gettempdir()])
print(TMP)

## Add LAN end

Schematic diagram of code addition location:Note the code addition location

After saving the changes, restart F5-TTS. Then fill in the IP address and port number after F5-TTS is started in pyVideotrans, for example, http://192.168.0.12:7860.

Add other languages

If you need to use models in other languages, you also need to modify the F5-TTS project directory/src/f5_tts/infer/infer_gradio.py file.

Find the code around line 59:

python
DEFAULT_TTS_MODEL_CFG = [
    "hf://SWivid/F5-TTS/F5TTS_v1_Base/model_1250000.safetensors",
    "hf://SWivid/F5-TTS/F5TTS_v1_Base/vocab.txt",
    json.dumps(dict(dim=1024, depth=22, heads=16, ff_mult=2, text_dim=512, conv_layers=4)),
]

Schematic diagram of code location:

By default, the official Chinese and English models are configured here. If you need to use models in other languages, please modify them according to the following instructions. After the modification is complete, you need to restart F5-TTS and ensure that the scientific internet access environment is configured so that the program can download the new language model online. After the download is successful, first clone a voice through the WebUI for testing, and then use it through pyVideoTrans.

Important note: Before using it, please ensure that the dubbing text language in pyVideoTrans is consistent with the model language selected in F5-TTS.

The following is the configuration information for each language model:

  1. French:

    python
    DEFAULT_TTS_MODEL_CFG = [
        "hf://RASPIAUDIO/F5-French-MixedSpeakers-reduced/model_last_reduced.pt",
        "hf://RASPIAUDIO/F5-French-MixedSpeakers-reduced/vocab.txt",
        json.dumps({"dim": 1024, "depth": 22, "heads": 16, "ff_mult": 2, "text_dim": 512, "text_mask_padding": False, "conv_layers": 4, "pe_attn_head": 1}),
    ]
  2. Hindi:

    python
    DEFAULT_TTS_MODEL_CFG = [
        "hf://SPRINGLab/F5-Hindi-24KHz/model_2500000.safetensors",
        "hf://SPRINGLab/F5-Hindi-24KHz/vocab.txt",
        json.dumps({"dim": 768, "depth": 18, "heads": 12, "ff_mult": 2, "text_dim": 512, "text_mask_padding": False, "conv_layers": 4, "pe_attn_head": 1})
    ]
  3. Italian:

    python
    DEFAULT_TTS_MODEL_CFG = [
        "hf://alien79/F5-TTS-italian/model_159600.safetensors",
        "hf://alien79/F5-TTS-italian/vocab.txt",
        json.dumps({"dim": 1024, "depth": 22, "heads": 16, "ff_mult": 2, "text_dim": 512, "text_mask_padding": False, "conv_layers": 4, "pe_attn_head": 1})
    ]
  4. Japanese:

    python
    DEFAULT_TTS_MODEL_CFG = [
        "hf://Jmica/F5TTS/JA_25498980/model_25498980.pt",
        "hf://Jmica/F5TTS/JA_25498980/vocab_updated.txt",
        json.dumps({"dim": 1024, "depth": 22, "heads": 16, "ff_mult": 2, "text_dim": 512, "text_mask_padding": False, "conv_layers": 4, "pe_attn_head": 1})
    ]
  5. Russian:

    python
    DEFAULT_TTS_MODEL_CFG = [
        "hf://hotstone228/F5-TTS-Russian/model_last.safetensors",
        "hf://hotstone228/F5-TTS-Russian/vocab.txt",
        json.dumps({"dim": 1024, "depth": 22, "heads": 16, "ff_mult": 2, "text_dim": 512, "text_mask_padding": False, "conv_layers": 4, "pe_attn_head": 1})
    ]
  6. Spanish:

    python
    DEFAULT_TTS_MODEL_CFG = [
        "hf://jpgallegoar/F5-Spanish/model_last.safetensors",
        "hf://jpgallegoar/F5-Spanish/vocab.txt",
        json.dumps({"dim": 1024, "depth": 22, "heads": 16, "ff_mult": 2, "text_dim": 512, "conv_layers": 4})
    ]
  7. Finnish:

    python
       DEFAULT_TTS_MODEL_CFG = [
        "hf://AsmoKoskinen/F5-TTS_Finnish_Model/model_common_voice_fi_vox_populi_fi_20241206.safetensors",
        "hf://AsmoKoskinen/F5-TTS_Finnish_Model/vocab.txt",
        json.dumps({"dim": 1024, "depth": 22, "heads": 16, "ff_mult": 2, "text_dim": 512, "text_mask_padding": False, "conv_layers": 4, "pe_attn_head": 1})]

Pay attention to official updates, and other languages can be added in a similar way. Address: https://github.com/SWivid/F5-TTS/blob/main/src/f5_tts/infer/SHARED.md

Common Errors and Precautions

  1. During API usage, you can close the WebUI interface in the browser, but you cannot close the terminal window that started F5-TTS.

    This interface cannot be closed, otherwise the API cannot be called

  2. Can I dynamically switch models in F5-TTS? No. You need to manually modify the code as described above and then restart the WebUI.

  3. This type of error occurs frequently

    raise ConnectTimeout(e, request=request)
requests.exceptions.ConnectTimeout: (MaxRetryError("HTTPSConnectionPool(host='huggingface.co', port=443): Max retries exceeded with url: /SWivid/F5-TTS/resolve/main/F5TTS_v1_Base/vocab.txt (Caused by ConnectTimeoutError(<urllib3.connection.HTTPSConnection object at 0x000002174796DF60>, 'Connection to huggingface.co timed out. (connect timeout=10)'))"), '(Request ID: 0458b571-90ab-4edd-ae59-b93bd603cdd0)')

Proxy issue, please use scientific internet access and use a smooth proxy. Refer to the above to configure scientific internet access.

  1. How to prevent connecting to huggingface.co every time?

Please ensure that you have successfully cloned at least once and the model has been downloaded Open F5-TTS root directory/src/f5_tts/infer/utils_infer.py

Search for snapshot_download and find the line of code as shown in the figure

Modify to

local_path = snapshot_download(repo_id="nvidia/bigvgan_v2_24khz_100band_256x", cache_dir=hf_cache_dir,local_files_only=True)

Then search for hf_hub_download, find the 2 lines of code as shown in the figure

Modify to

config_path = hf_hub_download(repo_id=repo_id, cache_dir=hf_cache_dir, filename="config.yaml",local_files_only=True)
            model_path = hf_hub_download(repo_id=repo_id, cache_dir=hf_cache_dir, filename="pytorch_model.bin",local_files_only=True)

In fact, the new parameter ,local_files_only=True is added to the place where these 3 lines of code are called. Please ensure that the model has been downloaded locally, otherwise an error that the model cannot be found will be reported.

  1. F5-TTS is deployed normally, but pyVideotrans test returns {detail:"Not found"}
    • Check whether other AI projects occupy the port. Generally, AI projects with interfaces mostly use the gradio interface, which also defaults to 7860. Close the others and restart F5-TTS
    • If pyVideotrans is deployed from source code, please execute pip install --upgrade gradio_client and then retry
    • Restart F5-TTS and start with the command f5-tts_infer-gradio --api