Skip to content

Chatterbox TTS API Service

This is a high-performance Text-to-Speech (TTS) service based on Chatterbox-TTS. It offers an OpenAI TTS-compatible API endpoint, an enhanced endpoint supporting voice cloning, and a clean web user interface.

This project aims to provide developers and content creators with a self-hostable, powerful, and easy-to-integrate TTS solution.

GitHub:https://github.com/jianchang512/chatterbox-api


Usage with pyVideoTrans

This project can serve as a powerful TTS backend to provide high-quality English voiceovers for pyVideoTrans.

  1. Start this project: Ensure the Chatterbox TTS API service is running locally (http://127.0.0.1:5093).

  2. Update pyVideoTrans: Make sure your pyVideoTrans version is v3.73 or higher.

  3. Configure pyVideoTrans:

    • In the pyVideoTrans menu, navigate to TTS Settings -> Chatterbox TTS.
    • API Address: Fill in the address of this service, which defaults to http://127.0.0.1:5093.
    • Reference Audio (Optional): If you want to use voice cloning, enter the filename of the reference audio here (e.g., my_voice.wav). Ensure this audio file is placed in the chatterbox folder within the pyVideoTrans root directory.
    • Adjust Parameters: Adjust cfg_weight and exaggeration as needed to achieve the best results.

    Parameter Adjustment Suggestions:

    • General Scenarios (TTS, Voice Assistant): The default settings (cfg_weight=0.5, exaggeration=0.5) are suitable for most cases.
    • Fast-Paced Reference Audio: If the reference audio has a fast pace, try lowering cfg_weight to around 0.3 to improve the rhythm of the generated speech.
    • Expressive/Dramatic Speech: Try a lower cfg_weight (e.g., 0.3) and a higher exaggeration (e.g., 0.7 or higher). Increasing exaggeration usually speeds up the speech; lowering cfg_weight helps to balance this, making the rhythm more relaxed and clearer.

✨ Features

  • Two API Endpoints:
    1. OpenAI-Compatible Endpoint: /v1/audio/speech, for seamless integration into any existing workflow that supports the OpenAI SDK.
    2. Voice Cloning Endpoint: /v2/audio/speech_with_prompt, to generate speech with the same timbre by uploading a short reference audio clip.
  • Web User Interface: Provides an intuitive front-end for quick testing and use of TTS functions without writing any code.
  • Flexible Output Formats: Supports generating audio in .mp3 and .wav formats.
  • Cross-Platform Support: Provides detailed installation guides for Windows, macOS, and Linux.
  • One-Click Windows Deployment: Offers a compressed package for Windows users, including all dependencies and startup scripts for an out-of-the-box experience.
  • GPU Acceleration: Supports NVIDIA GPUs (CUDA) and includes a one-click upgrade script for Windows users.
  • Seamless Integration: Can be easily integrated as a backend service with tools like pyVideoTrans.

🚀 Quick Start

We have prepared a portable package win.7z for Windows users that includes all dependencies, greatly simplifying the installation process.

  1. Download and Unzip: Get the package from https://github.com/jianchang512/chatterbox-api/releases and extract it to any location (it is recommended to avoid non-ASCII characters in the path).

  2. Install C++ Build Tools (Strongly Recommended):

    • Navigate to the unzipped tools folder and double-click vs_BuildTools.exe.
    • In the installer window, check the "Desktop development with C++" option and click install.
    • This step pre-installs many dependencies required for compiling Python packages, preventing numerous installation errors.
  3. Start the Service:

    • Double-click the 启动服务.bat (Start Service.bat) script in the root directory.
    • On the first run, the script will automatically create a Python virtual environment and install all necessary dependencies. This process may take a few minutes and will also download the TTS model, so please be patient.
    • After the installation is complete, the service will start automatically.

    When you see a message similar to the following in your command line window, the service has started successfully:

```
✅ Model loaded successfully.
Service started successfully. The HTTP address is: http://127.0.0.1:5093
```

Method 2: For macOS, Linux, and Manual Installation Users

For macOS, Linux users, or Windows users who prefer to set up the environment manually, please follow these steps.

1. Prerequisites

  • Python: Ensure you have Python 3.9 or higher installed.
  • ffmpeg: This is a required tool for audio and video processing.
    • macOS (using Homebrew): brew install ffmpeg
    • Debian/Ubuntu: sudo apt-get update && sudo apt-get install ffmpeg
    • Windows (Manual): Download ffmpeg and add it to your system's PATH environment variable.

2. Installation Steps

bash
# 1. Clone the project repository
git clone https://github.com/jianchang512/chatterbox-api.git
cd chatterbox-api

# 2. Create and activate a Python virtual environment (recommended)
python3 -m venv venv
# on Windows:
# venv\Scripts\activate
# on macOS/Linux:
source venv/bin/activate

# 3. Install dependencies
pip install -r requirements.txt

# 4. Start the service
python app.py

Once the service has started successfully, you will see the service address http://127.0.0.1:5093 in your terminal.


⚡ Upgrade to GPU Version (Optional)

If your computer is equipped with a CUDA-supported NVIDIA graphics card and you have correctly installed the NVIDIA driver and CUDA Toolkit, you can upgrade to the GPU version for a significant performance boost.

Windows Users (One-Click Upgrade)

  1. First, ensure you have successfully run 启动服务.bat at least once to complete the basic environment setup.
  2. Double-click the 安装N卡GPU支持.bat (Install NVIDIA GPU Support.bat) script.
  3. The script will automatically uninstall the CPU version of PyTorch and install the GPU version compatible with CUDA 12.6.

Linux Manual Upgrade

After activating the virtual environment, execute the following commands:

bash
# 1. Uninstall the existing CPU version of PyTorch
pip uninstall -y torch torchaudio

# 2. Install the PyTorch version that matches your CUDA version
# The following command is for CUDA 12.6. Please get the correct command from the PyTorch website for your CUDA version.
pip install torch torchaudio --index-url https://download.pytorch.org/whl/cu126

You can visit the official PyTorch website to get the appropriate installation command for your system.

After upgrading, restart the service. You will see Using device: cuda in the startup logs.


📖 Usage Guide

1. Web Interface

After the service starts, open http://127.0.0.1:5093 in your browser to access the Web UI.

  • Enter Text: Type the text you want to convert in the text box.
  • Adjust Parameters:
    • cfg_weight: (Range 0.0 - 1.0) Controls the rhythm of the speech. Lower values result in a slower, more relaxed pace. For fast-paced reference audio, consider lowering this value (e.g., to 0.3).
    • exaggeration: (Range 0.25 - 2.0) Controls the emotional and intonational exaggeration of the speech. Higher values produce more expressive speech, which may also increase the pace.
  • Voice Cloning: Click "Choose File" to upload a reference audio clip (e.g., .mp3, .wav). If a reference audio is provided, the service will use the cloning endpoint.
  • Generate Speech: Click the "Generate Speech" button. After a short wait, you can listen to and download the generated MP3 file online.

2. API Usage

Endpoint 1: OpenAI-Compatible Endpoint (/v1/audio/speech)

This endpoint does not require a reference audio and can be called directly using the OpenAI SDK.

Python Example (openai SDK):

python
from openai import OpenAI
import os

# Point the client to our local service
client = OpenAI(
    base_url="http://127.0.0.1:5093/v1",
    api_key="not-needed"  # The API key is not required, but the SDK needs a value
)

response = client.audio.speech.create(
    model="chatterbox-tts",   # This parameter is ignored
    voice="en",              # Used to pass the language code, currently only 'en' is supported
    speed=0.5,               # Corresponds to the cfg_weight parameter
    input="Hello, this is a test from the OpenAI compatible API.",
    instructions="0.5",      # (Optional) Corresponds to the exaggeration parameter, note it must be a string
    response_format="mp3"    # Optional: 'mp3' or 'wav'
)

# Stream the audio to a file
response.stream_to_file("output_api1.mp3")
print("Audio saved to output_api1.mp3")

Endpoint 2: Voice Cloning Endpoint (/v2/audio/speech_with_prompt)

This endpoint requires uploading both the text and a reference audio file via multipart/form-data.

Python Example (requests library):

python
import requests

API_URL = "http://127.0.0.1:5093/v2/audio/speech_with_prompt"
REFERENCE_AUDIO = "path/to/your/reference.mp3"  # Replace with the path to your reference audio

form_data = {
    'input': 'This voice should sound like the reference audio.',
    'cfg_weight': '0.5',
    'exaggeration': '0.5',
    'response_format': 'mp3'  # Optional: 'mp3' or 'wav'
}

with open(REFERENCE_AUDIO, 'rb') as audio_file:
    files = {'audio_prompt': audio_file}
    response = requests.post(API_URL, data=form_data, files=files)

if response.ok:
    with open("output_api2.mp3", "wb") as f:
        f.write(response.content)
    print("Cloned audio saved to output_api2.mp3")
else:
    print("Request failed:", response.text)