Using F5-TTS for Voiceovers | pyVideoTrans-Open Source Video Translation Tool -pyvideotrans.com github.com/jianchang512/pyvideotrans

The F5-TTS integration with pyVideoTrans described on this page only applies to pyVideoTrans versions V3.66 and later. For previous integration methods, please refer to Using F5-TTS with pyVideoTrans (versions < 3.66).

Starting from v3.68, this interface can be used for F5-TTS/Spark-TTS/index-TTS/Dia-TTS at the same time. You only need to fill in the correct URL address (usually http://127.0.0.1:7860 on this machine) and select the corresponding service in the drop-down list.

Integrate with pyVideoTrans's API

To use F5-TTS in the video translation software, you need to start F5-TTS first and keep the terminal window open.

Then, open the video translation software, select "TTS Settings" -> "F5-TTS API" in the menu, and fill in the F5-TTS startup address, which defaults to http://127.0.0.1:7860. If your startup address is not the default address, please fill it in according to the actual address.

In the "Reference Audio" field, fill in the following:

Audio file name to use#The corresponding text in the audio file

Note: Please place the reference audio file in the f5-tts folder in the root directory of the pyVideotrans project. If the folder does not exist, please create it manually. For example, you can name the reference audio file nverguo.wav.

Put the reference audio in the f5-tts folder in the pyVideotrans software, don't get it wrong

Here is an example:

Reference audio and the text within the reference audio

Re-recognize?: By default, the reference audio (the subtitles recognized during cloning) will be sent to F5-TTS to avoid F5-TTS starting whisper for speech recognition, saving time and improving efficiency. However, sometimes you may want F5-TTS to re-recognize, which can improve the cloning quality to some extent. You can check this checkbox at this time, but note that if this is the first time you do this after checking, F5-TTS will download the openai-whisper-v3 model online from huggingface.co, please ensure that you have scientific internet access.

Spark-TTS install index-TTS install Dia-1.6b install

F5-TTS install

F5-TTS is an open-source voice cloning tool developed by Shanghai Jiao Tong University, known for its excellent results. The initial version only supported Chinese and English cloning, but the latest version v1 has expanded to support multiple languages, including French, Italian, Hindi, Japanese, Russian, Spanish, and Finnish.

This article primarily explains how to install and start F5-TTS using the official source code and how to integrate it with the pyVideotrans project. Additionally, it will cover how to modify the source code to enable local network calls.

Due to limited time and resources, I will no longer maintain the previous personal integration package and API interface. Instead, I will unify the integration with the pyVideotrans project using the official interface. The limitation of the official interface is that it can only be called locally, not within a local network. Please refer to the Local Network Usage section for a solution.

Prerequisites

Your system must have Python 3.10 installed. While versions 3.11/3.12 might theoretically work, they have not been tested, so version 3.10 is recommended.

If Python is not installed:

Windows Installation Tutorial
Mac Installation: If not installed, please download the pkg installation package from the Python official website https://www.python.org/downloads/macos, and select version 3.10.11.

Check if Python is installed:

Windows: Press Win+R, enter cmd in the popup window, and press Enter. In the opened black window, enter python --version. If 3.10.xx is displayed, it is installed; if it says "python is not recognized as an internal or external command," it means Python is not installed or has not been added to the Path environment variable, and you need to reinstall.
Mac: Execute python3 --version directly in the terminal. If 3.10.x is output, it is installed; otherwise, it needs to be installed.

Download the F5-TTS Source Code

First, create an empty folder in a suitable location. It is recommended to choose a non-system drive, a location without special permissions, such as drive D. Avoid placing it in directories like C:/Program Files (it is recommended that the location and all levels of folders use names composed of only numbers or letters) to avoid potential problems. For example, D:/f5/v1 is a good location, while D:/开源 f5/f5 v1 with spaces or Chinese characters is not recommended.

This article uses installing F5-TTS in the D:/python/f5ttsnew folder on a Windows10 system as an example.

Open the website: https://github.com/SWivid/F5-TTS

Click to download the source code as shown in the figure below:

Download source code zip package

After downloading, unzip the package, and copy all files from the F5-TTS-main folder into the D:/python/f5ttsnew folder, as shown below:

F5-TTS-main folder inside the zip package

Copy to f5ttsnew

Create a Virtual Environment

Creating a virtual environment is highly recommended unless you have no other Python or AI projects on your computer. A virtual environment can effectively avoid many potential errors.

In the address bar of the newly created folder D:/python/f5ttsnew, enter cmd and press Enter (for Mac, please use the terminal to enter the folder).

Execute the following command to create a virtual environment: python -m venv venv. After execution, a folder named venv will be added to the folder.

Next, activate the virtual environment (note the spaces and dot symbols):

Windows: .\venv\scripts\activate
Mac: . ./venv/bin/activate

After the virtual environment is activated, (venv) will be added to the command prompt. Please ensure that all subsequent operations are performed in this virtual environment, and check that the command prompt has (venv) before each operation.

The (venv) at the beginning of the command line indicates activation

Install Dependencies

In the terminal with the virtual environment activated, continue to enter the following command (note the spaces and dot symbols):

pip install -e .

Wait for the installation to complete. If you need CUDA acceleration, continue to execute the following command (this is one line of command, do not wrap):

pip install torch==2.4.0+cu124 torchaudio==2.4.0+cu124 --extra-index-url https://download.pytorch.org/whl/cu124

Configure Scientific Internet Access

Important Note: F5-TTS needs to download models online from the huggingface.co website. Since this website is blocked in some regions and cannot be directly connected to, you must configure scientific internet access and enable global or system proxy before starting.

If your VPN tool provides an HTTP port (as shown below):

Check if the scientific tool provides a port

Enter the following command in the terminal to set the proxy:

Windows: set https_proxy=http://127.0.0.1:10808 (replace the port number with the actual port you are using)
Mac: https_proxy=http://127.0.0.1:10808 (replace the port number with the actual port you are using)

You can also directly modify the code to set the proxy, avoiding manual input in the terminal each time. Open the F5-TTS root directory/src/f5_tts/infer/infer_gradio.py file and add the following code at the top of the file:

python

import os
os.environ['https_proxy']='http://127.0.0.1:10808' # Fill in according to your actual proxy address

How to use v2ray (a scientific tool)

Start the WebUI Interface

After configuring scientific internet access, enter the following command in the terminal to start the WebUI:

f5-tts_infer-gradio

The first time you start it, the program will automatically download the model, which may be slow, so please be patient. When starting up later, the program may still connect to huggingface.co for detection, so it is recommended to keep the proxy turned on to avoid errors.

After the startup is successful, the terminal will display the IP address and port number, as shown below:

Successful startup when IP and port are displayed, the first time is slow

Open the displayed address in your browser, which defaults to http://127.0.0.1:7860.

webui interface

Solving Local Network Issues

If your F5-TTS is deployed on another computer within the local network, you need to modify the F5-TTS code to support local network access.

Open the F5-TTS project directory/src/f5_tts/infer/infer_gradio.py file and add the following code below line 16:

python

# Added Local Network start
import os
from pathlib import Path

ROOT=Path(os.getcwd()).as_posix()
TMP=f'{ROOT}/tmp'
Path(TMP).mkdir(exist_ok=True)
os.environ['GRADIO_TEMP_DIR']=TMP
gr.set_static_paths(paths=[TMP,tempfile.gettempdir()])
print(TMP)

## Added Local Network end

Diagram of where the code should be added:

Note the location where the code should be added

After saving the changes, restart F5-TTS. Then fill in the IP address and port number of F5-TTS after startup in pyVideotrans, such as http://192.168.0.12:7860.

Adding Other Languages

If you need to use models for other languages, you also need to modify the F5-TTS project directory/src/f5_tts/infer/infer_gradio.py file.

Find the code around line 59:

python

DEFAULT_TTS_MODEL_CFG = [
    "hf://SWivid/F5-TTS/F5TTS_v1_Base/model_1250000.safetensors",
    "hf://SWivid/F5-TTS/F5TTS_v1_Base/vocab.txt",
    json.dumps(dict(dim=1024, depth=22, heads=16, ff_mult=2, text_dim=512, conv_layers=4)),
]

Diagram of the code location:

By default, this is where the official Chinese and English models are configured. If you need to use models for other languages, please modify them according to the following instructions. After the modification is complete, you need to restart F5-TTS and ensure that scientific internet access is configured so that the program can download the new language model online. After the download is successful, first clone a voice using the WebUI for testing, and then use it through pyVideoTrans.

Important Note: Before use, please ensure that the language of the voiceover text in pyVideoTrans matches the language of the model selected in F5-TTS.

Here is the configuration information for each language model:

French:

python

DEFAULT_TTS_MODEL_CFG = [
    "hf://RASPIAUDIO/F5-French-MixedSpeakers-reduced/model_last_reduced.pt",
    "hf://RASPIAUDIO/F5-French-MixedSpeakers-reduced/vocab.txt",
    json.dumps({"dim": 1024, "depth": 22, "heads": 16, "ff_mult": 2, "text_dim": 512, "text_mask_padding": False, "conv_layers": 4, "pe_attn_head": 1}),
]

Hindi:

python

DEFAULT_TTS_MODEL_CFG = [
    "hf://SPRINGLab/F5-Hindi-24KHz/model_2500000.safetensors",
    "hf://SPRINGLab/F5-Hindi-24KHz/vocab.txt",
    json.dumps({"dim": 768, "depth": 18, "heads": 12, "ff_mult": 2, "text_dim": 512, "text_mask_padding": False, "conv_layers": 4, "pe_attn_head": 1})
]

Italian:

python

DEFAULT_TTS_MODEL_CFG = [
    "hf://alien79/F5-TTS-italian/model_159600.safetensors",
    "hf://alien79/F5-TTS-italian/vocab.txt",
    json.dumps({"dim": 1024, "depth": 22, "heads": 16, "ff_mult": 2, "text_dim": 512, "text_mask_padding": False, "conv_layers": 4, "pe_attn_head": 1})
]

Japanese:

python

DEFAULT_TTS_MODEL_CFG = [
    "hf://Jmica/F5TTS/JA_25498980/model_25498980.pt",
    "hf://Jmica/F5TTS/JA_25498980/vocab_updated.txt",
    json.dumps({"dim": 1024, "depth": 22, "heads": 16, "ff_mult": 2, "text_dim": 512, "text_mask_padding": False, "conv_layers": 4, "pe_attn_head": 1})
]

Russian:

python

DEFAULT_TTS_MODEL_CFG = [
    "hf://hotstone228/F5-TTS-Russian/model_last.safetensors",
    "hf://hotstone228/F5-TTS-Russian/vocab.txt",
    json.dumps({"dim": 1024, "depth": 22, "heads": 16, "ff_mult": 2, "text_dim": 512, "text_mask_padding": False, "conv_layers": 4, "pe_attn_head": 1})
]

Spanish:

python

DEFAULT_TTS_MODEL_CFG = [
    "hf://jpgallegoar/F5-Spanish/model_last.safetensors",
    "hf://jpgallegoar/F5-Spanish/vocab.txt",
    json.dumps({"dim": 1024, "depth": 22, "heads": 16, "ff_mult": 2, "text_dim": 512, "conv_layers": 4})
]

Finnish:

python

   DEFAULT_TTS_MODEL_CFG = [
    "hf://AsmoKoskinen/F5-TTS_Finnish_Model/model_common_voice_fi_vox_populi_fi_20241206.safetensors",
    "hf://AsmoKoskinen/F5-TTS_Finnish_Model/vocab.txt",
    json.dumps({"dim": 1024, "depth": 22, "heads": 16, "ff_mult": 2, "text_dim": 512, "text_mask_padding": False, "conv_layers": 4, "pe_attn_head": 1})]

You can follow the official updates, and other languages can be added in a similar way: https://github.com/SWivid/F5-TTS/blob/main/src/f5_tts/infer/SHARED.md

Common Errors and Precautions

During API usage, you can close the WebUI interface in the browser, but you cannot close the terminal window that started F5-TTS.
Can I dynamically switch models in F5-TTS? No. You need to manually modify the code as described above and then restart the WebUI.
Frequent errors like this occur

    raise ConnectTimeout(e, request=request)
requests.exceptions.ConnectTimeout: (MaxRetryError("HTTPSConnectionPool(host='huggingface.co', port=443): Max retries exceeded with url: /SWivid/F5-TTS/resolve/main/F5TTS_v1_Base/vocab.txt (Caused by ConnectTimeoutError(<urllib3.connection.HTTPSConnection object at 0x000002174796DF60>, 'Connection to huggingface.co timed out. (connect timeout=10)'))"), '(Request ID: 0458b571-90ab-4edd-ae59-b93bd603cdd0)')

It is a problem with the proxy, please use scientific internet access and a smooth proxy. Refer to the above configuration for scientific internet access

How to prohibit connecting to huggingface.co every time?

Please make sure you have successfully cloned at least once and the model has been downloaded Open F5-TTS root directory/src/f5_tts/infer/utils_infer.py

Search for snapshot_download and find the line of code as shown in the figure

Modify to

local_path = snapshot_download(repo_id="nvidia/bigvgan_v2_24khz_100band_256x", cache_dir=hf_cache_dir,local_files_only=True)

Then search for hf_hub_download, find the 2 lines of code as shown in the figure

Modify to

config_path = hf_hub_download(repo_id=repo_id, cache_dir=hf_cache_dir, filename="config.yaml",local_files_only=True)
            model_path = hf_hub_download(repo_id=repo_id, cache_dir=hf_cache_dir, filename="pytorch_model.bin",local_files_only=True)

In fact, the new parameter ,local_files_only=True is added to the place where these 3 lines of code are called Please make sure the model has been downloaded locally, otherwise it will report a model not found error

F5-TTS is deployed normally, but pyVideotrans returns {detail:"Not found"} during testing
- Check if other AI projects are occupying the port. Generally, AI projects with interfaces mostly use the gradio interface, which also defaults to 7860. Close the others and restart F5-TTS
- If pyVideotrans is deployed from source code, execute pip install --upgrade gradio_client and try again
- Restart F5-TTS, start with the command f5-tts_infer-gradio --api

Integrate with pyVideoTrans's API ​

F5-TTS install ​

Prerequisites ​

Download the F5-TTS Source Code ​

Create a Virtual Environment ​

Install Dependencies ​

Configure Scientific Internet Access ​

Start the WebUI Interface ​

Solving Local Network Issues ​

Adding Other Languages ​

Common Errors and Precautions ​

Integrate with pyVideoTrans's API

F5-TTS install

Prerequisites

Download the F5-TTS Source Code

Create a Virtual Environment

Install Dependencies

Configure Scientific Internet Access

Start the WebUI Interface

Solving Local Network Issues

Adding Other Languages

Common Errors and Precautions