Using Legacy F5-TTS for Voiceovers | pyVideoTrans-Open Source Video Translation Tool -pyvideotrans.com github.com/jianchang512/pyvideotrans

F5-TTS-api

The source code for this project is available at https://github.com/jianchang512/f5-tts-api

This is the API and web UI for the F5-TTS project.

F5-TTS is an advanced text-to-speech system that uses deep learning technology to generate realistic, high-quality human voices. It can clone your voice with just a short 10-second audio sample. F5-TTS accurately reproduces speech and imbues it with rich emotion.

Original audio - Queen of the Daughter Country

Cloned audio

Windows Integrated Package (Includes F5-TTS Model and Runtime Environment)

123 Cloud Drive Download: https://www.123684.com/s/03Sxjv-okTJ3
Hugging Face Download: https://huggingface.co/spaces/mortimerme/s4/resolve/main/f5-tts-api-v0.3.7z?download=true

Supported Systems: Windows 10/11 (Download and unzip to use)

How to Use:

Start the API Service: Double-click the run-api.bat file. The API address is http://127.0.0.1:5010/api.

You must start the API service to use it in the translation software.

The integrated package defaults to CUDA version 11.8. If you have an NVIDIA graphics card and have configured the CUDA/cuDNN environment, the system will automatically use GPU acceleration. If you want to use a higher version of CUDA, such as 12.4, please do the following:
Navigate to the folder containing api.py, enter cmd in the folder address bar and press Enter. Then, execute the following commands in the terminal that appears:
.\runtime\python -m pip uninstall -y torch torchaudio
.\runtime\python -m pip install torch torchaudio --index-url https://download.pytorch.org/whl/cu124

The advantage of F5-TTS lies in its efficiency and high-quality voice output. Compared to similar technologies that require longer audio samples, F5-TTS can generate high-fidelity speech with only a very short audio clip and can express emotions well, enhancing the listening experience, which is difficult for many existing technologies to achieve.

Currently, F5-TTS supports both English and Chinese.

Usage Tips: Proxy/VPN

The model needs to be downloaded from the huggingface.co website. Since this website is not accessible in China, please set up a system proxy or global proxy in advance, otherwise the model download will fail.

The integrated package has integrated most of the required models, but it may detect updates or download other small dependency models, so if the terminal shows an HTTPSConnect error, you still need to set up a system proxy.

Using in Video Translation Software

Start the API service. You must start the API service to use it in the translation software.
Open the video translation software, find the TTS settings, select F5-TTS, and enter the API address (default is http://127.0.0.1:5010).
Enter the reference audio and audio text.
It is recommended to select "f5-tts" for better generation quality.

Using `api.py` within a Third-Party Integrated Package

Copy api.py and the configs folder to the root directory of the third-party integrated package.
Check the path of the Python.exe integrated by the third-party package, for example, in the py311 folder. Then, enter cmd in the folder address bar in the root directory and press Enter. Next, execute the command .\py311\python api.py. If it prompts module flask not found, first execute .\py311\python -m pip install waitress flask.

Using `api.py` after Source Code Deployment of the Official F5-TTS Project

Copy api.py and the configs folder to the project folder.
Install the modules: pip install flask waitress
Execute: python api.py

API Usage Example

import requests

res=requests.post('http://127.0.0.1:5010/api',data={
    "ref_text": 'Fill in the text corresponding to 1.wav here',
    "gen_text": '''Fill in the text to be generated here.''',
    "model": 'f5-tts'
},files={"audio":open('./1.wav','rb')})

if res.status_code!=200:
    print(res.text)
    exit()

with open("ceshi.wav",'wb') as f:
    f.write(res.content)

OpenAI TTS Interface Compatibility

The voice parameter must be separated by three # symbols, dividing the reference audio and the corresponding text of the reference audio, for example:

1.wav###You say everything is emptiness, but why do you close your eyes? If you open your eyes and look at me, I don't believe you, your eyes are empty. This indicates that the reference audio is 1.wav and located in the same location as api.py, and the text content in 1.wav is "You say everything is emptiness, but why do you close your eyes? If you open your eyes and look at me, I don't believe you, your eyes are empty."

The returned data is fixed as WAV audio data.

import requests
import json
import os
import base64
import struct


from openai import OpenAI

client = OpenAI(api_key='12314', base_url='http://127.0.0.1:5010/v1')
with  client.audio.speech.with_streaming_response.create(
                    model='f5-tts',
                    voice='1.wav###You say everything is emptiness, but why do you close your eyes? If you open your eyes and look at me, I don't believe you, your eyes are empty.',
                    input='Hello, dear friends',
                    speed=1.0
                ) as response:
    with open('./test.wav', 'wb') as f:
       for chunk in response.iter_bytes():
            f.write(chunk)

F5-TTS-api ​

Windows Integrated Package (Includes F5-TTS Model and Runtime Environment) ​

Usage Tips: Proxy/VPN ​

Using in Video Translation Software ​

Using api.py within a Third-Party Integrated Package ​

Using api.py after Source Code Deployment of the Official F5-TTS Project ​

API Usage Example ​

OpenAI TTS Interface Compatibility ​