Using F5-TTS for Voiceovers | pyVideoTrans-Open Source Video Translation Tool -pyvideotrans.com github.com/jianchang512/pyvideotrans

The method described on this page for integrating F5-TTS with pyVideoTrans is only applicable to pyVideoTrans versions V3.66 and later. For previous versions, please refer to Using F5-TTS with pyVideotrans (Versions Less Than 3.66).

Starting from v3.68, this interface can be used for F5-TTS/Spark-TTS/index-TTS/Dia-TTS simultaneously. You only need to enter the correct URL address (usually http://127.0.0.1:7860 for your local machine) and select the corresponding service from the dropdown list.

index-tts Deployment Method dia-1.6b Deployment Method spark-tts Deployment Method

Configuration

To use TTS in the video translation software, you first need to launch the corresponding TTS web UI and keep the terminal window open.

Then, fill in the URL address on this configuration page, which defaults to http://127.0.0.1:7860. If your launch address is not the default address, please fill in the actual address.

In the "Reference Audio" field, enter the following:

Audio file name you want to use#The corresponding text in the audio file

Note: Please place the reference audio file in the f5-tts folder under the pyVideotrans project root directory. If this folder does not exist, please create it manually. For example, you can name the reference audio file nverguo.wav.

Place the reference audio in the f5-tts folder within the pyVideotrans software, don't get it wrong

Example of how to fill it in:

Reference audio and the text within the reference audio

Click to view Spark-TTS source code deployment method Click to view index-TTS source code deployment method Click to view Dia-1.6b source code deployment method See below for F5-TTS source code deployment method.

F5-TTS Source Code Deployment Method

F5-TTS is an open-source voice cloning tool from Shanghai Jiao Tong University with excellent results. The initial version only supported Chinese and English cloning, but the latest version v1 has been expanded to support multiple languages, including French, Italian, Hindi, Japanese, Russian, Spanish, and Finnish.

This article mainly introduces how to install and start F5-TTS using the official source code and how to integrate it with the pyVideotrans project. In addition, it will also introduce how to achieve local network calls by modifying the source code.

At the same time, due to limited energy, I will no longer maintain the previous personal integration package and API interface, but instead uniformly use the official interface to integrate with the pyVideotrans project. The limitation of the official interface is that it can only be called on the local machine and cannot be called in the local network. See the Local Network Usage section of this article for solutions.

Prerequisites

Your system must have Python 3.10 installed. Although versions 3.11/3.12 may theoretically work, they have not been actually tested, so it is recommended to use version 3.10.

If Python is not installed:

Windows System Installation Tutorial
Mac System Installation: If not installed, please visit the Python official website to download the pkg installation package https://www.python.org/downloads/macos, and select version 3.10.11.

Check if Python is installed:

Windows System: Press Win+R, enter cmd in the pop-up window and press Enter. Enter python --version in the opened black window. If 3.10.xx is displayed, it means it is installed; if "python is not an internal or external command" is prompted, it means that Python is not installed or Python has not been added to the Path environment variable, and it needs to be reinstalled.
Mac System: Directly execute python3 --version in the terminal. If 3.10.x is output, it means it is installed; otherwise, it needs to be installed.

Download F5-TTS Source Code

First, create an empty folder in a suitable location. It is recommended to choose a non-system disk and a location that does not require special permissions, such as the D drive. Avoid placing it in directories such as C:/Program Files (it is recommended that the location and folder names at all levels use names composed of pure numbers or letters) to avoid potential problems. For example, D:/f5/v1 is a good location, while D:/开源 f5/f5 v1 with spaces and Chinese characters is not recommended.

This article takes installing F5-TTS in the D:/python/f5ttsnew folder of the Windows10 system as an example.

Open the website: https://github.com/SWivid/F5-TTS

As shown in the figure below, click to download the source code:

Download source code zip package

After the download is complete, unzip the compressed package and copy all the files in the F5-TTS-main folder to the D:/python/f5ttsnew folder, as shown in the figure below:

Inside the F5-TTS-main folder in the compressed package

Copy to f5ttsnew

Create a Virtual Environment

It is strongly recommended to create a virtual environment unless your computer does not have other Python projects or AI projects. Virtual environments can effectively avoid many potential errors.

Enter cmd in the address bar of the newly created folder D:/python/f5ttsnew and press Enter (Mac systems please use the terminal to enter the folder).

Execute the following command to create a virtual environment: python -m venv venv. After the execution is complete, a folder named venv will be added to the folder.

Next, activate the virtual environment (note the spaces and dots):

Windows System: .\venv\scripts\activate
Mac System: . ./venv/bin/activate

After the virtual environment is activated, the command prompt will be preceded by (venv). Please make sure that all subsequent operations are performed in this virtual environment, and check whether the command prompt is preceded by (venv) before each operation.

Command line preceded by (venv) indicates activation

Install Dependencies

In the terminal with the virtual environment activated, continue to enter the following command (note the spaces and dots):

pip install -e .

Wait for the installation to complete. If CUDA acceleration is required, please continue to execute the following command (this is one line of command, do not wrap it):

pip install torch==2.4.0+cu124 torchaudio==2.4.0+cu124 --extra-index-url https://download.pytorch.org/whl/cu124

Configure a Scientific Internet Access Environment

Important Note: F5-TTS needs to download models online from the huggingface.co website. Since this website is blocked in China and cannot be directly connected, you must configure a scientific internet access environment and enable global or system proxy before starting.

If the VPN tool you are using provides an HTTP port (as shown in the figure below):

Check whether the science software provides a port

Please enter the following command in the terminal to set the proxy:

Windows System: set https_proxy=http://127.0.0.1:10808 (please replace the port number with the actual port you are using)
Mac System: https_proxy=http://127.0.0.1:10808 (please replace the port number with the actual port you are using)

You can also directly modify the code to set the proxy to avoid manually entering it in the terminal each time. Open the F5-TTS root directory/src/f5_tts/infer/infer_gradio.py file and add the following code at the top of the file:

python

import os
os.environ['https_proxy']='http://127.0.0.1:10808' # Fill in according to your actual proxy address

Start the WebUI Interface

After configuring the scientific internet access environment, enter the following command in the terminal to start the WebUI:

f5-tts_infer-gradio

When starting for the first time, the program will automatically download the model. The process may be slow, please be patient. When starting later, the program may still connect to huggingface.co for detection. It is recommended to keep the proxy enabled to avoid errors.

After successful startup, the terminal will display the IP address and port number, as shown in the figure below:

The startup is successful when the IP and port are displayed, the first time is very slow

Open the displayed address in the browser, which defaults to http://127.0.0.1:7860.

webui interface

Re-identify?: By default, the reference audio (the subtitles identified during cloning) will be sent to F5-TTS to avoid F5-TTS starting whisper for speech recognition, saving time and improving efficiency. However, sometimes you may want F5-TTS to re-identify it, which can improve the cloning quality to a certain extent. At this time, you can check the checkbox, but note that if you check it for the first time, F5-TTS will download the openai-whisper-v3 model online from huggingface.co. Please make sure you have access to scientific internet.

Solve Local Network Problems

If your F5-TTS is deployed on another computer in the local network, you need to modify the F5-TTS code to support local network access.

Open the F5-TTS project directory/src/f5_tts/infer/infer_gradio.py file and add the following code below line 16:

python

# Add Local Network start
import os
from pathlib import Path

ROOT=Path(os.getcwd()).as_posix()
TMP=f'{ROOT}/tmp'
Path(TMP).mkdir(exist_ok=True)
os.environ['GRADIO_TEMP_DIR']=TMP
gr.set_static_paths(paths=[TMP,tempfile.gettempdir()])
print(TMP)

## Add Local Network end

Schematic diagram of the code addition position: Note the position where the code is added

After saving the changes, restart F5-TTS. Then fill in the IP address and port number after F5-TTS starts in pyVideotrans, such as http://192.168.0.12:7860.

Add Other Languages

If you need to use models in other languages, you also need to modify the F5-TTS project directory/src/f5_tts/infer/infer_gradio.py file.

Find the code around line 59:

python

DEFAULT_TTS_MODEL_CFG = [
    "hf://SWivid/F5-TTS/F5TTS_v1_Base/model_1250000.safetensors",
    "hf://SWivid/F5-TTS/F5TTS_v1_Base/vocab.txt",
    json.dumps(dict(dim=1024, depth=22, heads=16, ff_mult=2, text_dim=512, conv_layers=4)),
]

Schematic diagram of the code position:

By default, this configures the official Chinese and English models. If you need to use models in other languages, please modify them according to the following instructions. After the modification is complete, you need to restart F5-TTS and ensure that the scientific internet access environment is configured so that the program can download the new language model online. After the download is successful, first clone a timbre through the WebUI for testing, and then use it through pyVideoTrans.

Important Note: Before use, please make sure that the language of the voiceover text in pyVideoTrans is consistent with the language of the model selected in F5-TTS.

The following is the configuration information for each language model:

French:

python

DEFAULT_TTS_MODEL_CFG = [
    "hf://RASPIAUDIO/F5-French-MixedSpeakers-reduced/model_last_reduced.pt",
    "hf://RASPIAUDIO/F5-French-MixedSpeakers-reduced/vocab.txt",
    json.dumps({"dim": 1024, "depth": 22, "heads": 16, "ff_mult": 2, "text_dim": 512, "text_mask_padding": False, "conv_layers": 4, "pe_attn_head": 1}),
]

Hindi:

python

DEFAULT_TTS_MODEL_CFG = [
    "hf://SPRINGLab/F5-Hindi-24KHz/model_2500000.safetensors",
    "hf://SPRINGLab/F5-Hindi-24KHz/vocab.txt",
    json.dumps({"dim": 768, "depth": 18, "heads": 12, "ff_mult": 2, "text_dim": 512, "text_mask_padding": False, "conv_layers": 4, "pe_attn_head": 1})
]

Italian:

python

DEFAULT_TTS_MODEL_CFG = [
    "hf://alien79/F5-TTS-italian/model_159600.safetensors",
    "hf://alien79/F5-TTS-italian/vocab.txt",
    json.dumps({"dim": 1024, "depth": 22, "heads": 16, "ff_mult": 2, "text_dim": 512, "text_mask_padding": False, "conv_layers": 4, "pe_attn_head": 1})
]

Japanese:

python

DEFAULT_TTS_MODEL_CFG = [
    "hf://Jmica/F5TTS/JA_25498980/model_25498980.pt",
    "hf://Jmica/F5TTS/JA_25498980/vocab_updated.txt",
    json.dumps({"dim": 1024, "depth": 22, "heads": 16, "ff_mult": 2, "text_dim": 512, "text_mask_padding": False, "conv_layers": 4, "pe_attn_head": 1})
]

Russian:

python

DEFAULT_TTS_MODEL_CFG = [
    "hf://hotstone228/F5-TTS-Russian/model_last.safetensors",
    "hf://hotstone228/F5-TTS-Russian/vocab.txt",
    json.dumps({"dim": 1024, "depth": 22, "heads": 16, "ff_mult": 2, "text_dim": 512, "text_mask_padding": False, "conv_layers": 4, "pe_attn_head": 1})
]

Spanish:

python

DEFAULT_TTS_MODEL_CFG = [
    "hf://jpgallegoar/F5-Spanish/model_last.safetensors",
    "hf://jpgallegoar/F5-Spanish/vocab.txt",
    json.dumps({"dim": 1024, "depth": 22, "heads": 16, "ff_mult": 2, "text_dim": 512, "conv_layers": 4})
]

Finnish:

python

   DEFAULT_TTS_MODEL_CFG = [
    "hf://AsmoKoskinen/F5-TTS_Finnish_Model/model_common_voice_fi_vox_populi_fi_20241206.safetensors",
    "hf://AsmoKoskinen/F5-TTS_Finnish_Model/vocab.txt",
    json.dumps({"dim": 1024, "depth": 22, "heads": 16, "ff_mult": 2, "text_dim": 512, "text_mask_padding": False, "conv_layers": 4, "pe_attn_head": 1})]

Pay attention to official updates, and other languages can be added in a similar way. Address: https://github.com/SWivid/F5-TTS/blob/main/src/f5_tts/infer/SHARED.md

Common Errors and Precautions

During API usage, you can close the WebUI interface in the browser, but you cannot close the terminal window that started F5-TTS.
Can I dynamically switch models in F5-TTS? No. You need to manually modify the code as described above, and then restart the WebUI.
Frequently getting this kind of error

    raise ConnectTimeout(e, request=request)
requests.exceptions.ConnectTimeout: (MaxRetryError("HTTPSConnectionPool(host='huggingface.co', port=443): Max retries exceeded with url: /SWivid/F5-TTS/resolve/main/F5TTS_v1_Base/vocab.txt (Caused by ConnectTimeoutError(<urllib3.connection.HTTPSConnection object at 0x000002174796DF60>, 'Connection to huggingface.co timed out. (connect timeout=10)'))"), '(Request ID: 0458b571-90ab-4edd-ae59-b93bd603cdd0)')

It's a proxy issue, please use a scientific internet access and a smooth proxy. Refer to the above to configure the scientific internet access environment.

How to prevent connecting to huggingface.co every time?

Please make sure you have successfully cloned at least once, and the model has been downloaded. Open F5-TTS root directory/src/f5_tts/infer/utils_infer.py

Search for snapshot_download, find the line of code as shown in the figure

Modify it to

local_path = snapshot_download(repo_id="nvidia/bigvgan_v2_24khz_100band_256x", cache_dir=hf_cache_dir,local_files_only=True)

Then search for hf_hub_download, find the 2 lines of code as shown in the figure

Modify it to

config_path = hf_hub_download(repo_id=repo_id, cache_dir=hf_cache_dir, filename="config.yaml",local_files_only=True)
            model_path = hf_hub_download(repo_id=repo_id, cache_dir=hf_cache_dir, filename="pytorch_model.bin",local_files_only=True)

In fact, it is just adding the new parameter ,local_files_only=True where these 3 lines of code are called. Please make sure the model has been downloaded locally, otherwise a model not found error will be reported.

F5-TTS is deployed normally, but the test in pyVideotrans returns {detail:"Not found"}
- Check if other AI projects are occupying the port. Generally, AI projects with interfaces mostly use the gradio interface, which also defaults to 7860. Close the others and restart F5-TTS.
- If pyVideotrans is source code deployed, please execute pip install --upgrade gradio_client and then try again.
- Restart F5-TTS and start it with the command f5-tts_infer-gradio --api

Configuration ​

F5-TTS Source Code Deployment Method ​

Prerequisites ​

Download F5-TTS Source Code ​

Create a Virtual Environment ​

Install Dependencies ​

Configure a Scientific Internet Access Environment ​

Start the WebUI Interface ​

Solve Local Network Problems ​

Add Other Languages ​

Common Errors and Precautions ​

Configuration

F5-TTS Source Code Deployment Method

Prerequisites

Download F5-TTS Source Code

Create a Virtual Environment

Install Dependencies

Configure a Scientific Internet Access Environment

Start the WebUI Interface

Solve Local Network Problems

Add Other Languages

Common Errors and Precautions