Chatterbox TTS API Service
This is a high-performance Text-to-Speech (TTS) service based on Chatterbox-TTS. It offers an OpenAI TTS-compatible API endpoint, an enhanced endpoint supporting voice cloning, and a clean web user interface.
This project aims to provide developers and content creators with a self-hostable, powerful, and easy-to-integrate TTS solution.
Usage with pyVideoTrans
This project can serve as a powerful TTS backend to provide high-quality English voiceovers for pyVideoTrans.
Start this project: Ensure the Chatterbox TTS API service is running locally (
http://127.0.0.1:5093
).Update pyVideoTrans: Make sure your pyVideoTrans version is
v3.73
or higher.Configure pyVideoTrans:
- In the pyVideoTrans menu, navigate to
TTS Settings
->Chatterbox TTS
. - API Address: Fill in the address of this service, which defaults to
http://127.0.0.1:5093
. - Reference Audio (Optional): If you want to use voice cloning, enter the filename of the reference audio here (e.g.,
my_voice.wav
). Ensure this audio file is placed in thechatterbox
folder within the pyVideoTrans root directory. - Adjust Parameters: Adjust
cfg_weight
andexaggeration
as needed to achieve the best results.
Parameter Adjustment Suggestions:
- General Scenarios (TTS, Voice Assistant): The default settings (
cfg_weight=0.5
,exaggeration=0.5
) are suitable for most cases. - Fast-Paced Reference Audio: If the reference audio has a fast pace, try lowering
cfg_weight
to around0.3
to improve the rhythm of the generated speech. - Expressive/Dramatic Speech: Try a lower
cfg_weight
(e.g.,0.3
) and a higherexaggeration
(e.g.,0.7
or higher). Increasingexaggeration
usually speeds up the speech; loweringcfg_weight
helps to balance this, making the rhythm more relaxed and clearer.
- In the pyVideoTrans menu, navigate to
✨ Features
- Two API Endpoints:
- OpenAI-Compatible Endpoint:
/v1/audio/speech
, for seamless integration into any existing workflow that supports the OpenAI SDK. - Voice Cloning Endpoint:
/v2/audio/speech_with_prompt
, to generate speech with the same timbre by uploading a short reference audio clip.
- OpenAI-Compatible Endpoint:
- Web User Interface: Provides an intuitive front-end for quick testing and use of TTS functions without writing any code.
- Flexible Output Formats: Supports generating audio in
.mp3
and.wav
formats. - Cross-Platform Support: Provides detailed installation guides for Windows, macOS, and Linux.
- One-Click Windows Deployment: Offers a compressed package for Windows users, including all dependencies and startup scripts for an out-of-the-box experience.
- GPU Acceleration: Supports NVIDIA GPUs (CUDA) and includes a one-click upgrade script for Windows users.
- Seamless Integration: Can be easily integrated as a backend service with tools like pyVideoTrans.
🚀 Quick Start
Method 1: For Windows Users (Recommended, One-Click Start)
We have prepared a portable package win.7z
for Windows users that includes all dependencies, greatly simplifying the installation process.
Download and Unzip: Get the package from https://github.com/jianchang512/chatterbox-api/releases and extract it to any location (it is recommended to avoid non-ASCII characters in the path).
Install C++ Build Tools (Strongly Recommended):
- Navigate to the unzipped
tools
folder and double-clickvs_BuildTools.exe
. - In the installer window, check the "Desktop development with C++" option and click install.
- This step pre-installs many dependencies required for compiling Python packages, preventing numerous installation errors.
- Navigate to the unzipped
Start the Service:
- Double-click the
启动服务.bat
(Start Service.bat) script in the root directory. - On the first run, the script will automatically create a Python virtual environment and install all necessary dependencies. This process may take a few minutes and will also download the TTS model, so please be patient.
- After the installation is complete, the service will start automatically.
When you see a message similar to the following in your command line window, the service has started successfully:
- Double-click the
```
✅ Model loaded successfully.
Service started successfully. The HTTP address is: http://127.0.0.1:5093
```
Method 2: For macOS, Linux, and Manual Installation Users
For macOS, Linux users, or Windows users who prefer to set up the environment manually, please follow these steps.
1. Prerequisites
- Python: Ensure you have Python 3.9 or higher installed.
- ffmpeg: This is a required tool for audio and video processing.
- macOS (using Homebrew):
brew install ffmpeg
- Debian/Ubuntu:
sudo apt-get update && sudo apt-get install ffmpeg
- Windows (Manual): Download ffmpeg and add it to your system's
PATH
environment variable.
- macOS (using Homebrew):
2. Installation Steps
# 1. Clone the project repository
git clone https://github.com/jianchang512/chatterbox-api.git
cd chatterbox-api
# 2. Create and activate a Python virtual environment (recommended)
python3 -m venv venv
# on Windows:
# venv\Scripts\activate
# on macOS/Linux:
source venv/bin/activate
# 3. Install dependencies
pip install -r requirements.txt
# 4. Start the service
python app.py
Once the service has started successfully, you will see the service address http://127.0.0.1:5093
in your terminal.
⚡ Upgrade to GPU Version (Optional)
If your computer is equipped with a CUDA-supported NVIDIA graphics card and you have correctly installed the NVIDIA driver and CUDA Toolkit, you can upgrade to the GPU version for a significant performance boost.
Windows Users (One-Click Upgrade)
- First, ensure you have successfully run
启动服务.bat
at least once to complete the basic environment setup. - Double-click the
安装N卡GPU支持.bat
(Install NVIDIA GPU Support.bat) script. - The script will automatically uninstall the CPU version of PyTorch and install the GPU version compatible with CUDA 12.6.
Linux Manual Upgrade
After activating the virtual environment, execute the following commands:
# 1. Uninstall the existing CPU version of PyTorch
pip uninstall -y torch torchaudio
# 2. Install the PyTorch version that matches your CUDA version
# The following command is for CUDA 12.6. Please get the correct command from the PyTorch website for your CUDA version.
pip install torch torchaudio --index-url https://download.pytorch.org/whl/cu126
You can visit the official PyTorch website to get the appropriate installation command for your system.
After upgrading, restart the service. You will see Using device: cuda
in the startup logs.
📖 Usage Guide
1. Web Interface
After the service starts, open http://127.0.0.1:5093
in your browser to access the Web UI.
- Enter Text: Type the text you want to convert in the text box.
- Adjust Parameters:
cfg_weight
: (Range 0.0 - 1.0) Controls the rhythm of the speech. Lower values result in a slower, more relaxed pace. For fast-paced reference audio, consider lowering this value (e.g., to 0.3).exaggeration
: (Range 0.25 - 2.0) Controls the emotional and intonational exaggeration of the speech. Higher values produce more expressive speech, which may also increase the pace.
- Voice Cloning: Click "Choose File" to upload a reference audio clip (e.g., .mp3, .wav). If a reference audio is provided, the service will use the cloning endpoint.
- Generate Speech: Click the "Generate Speech" button. After a short wait, you can listen to and download the generated MP3 file online.
2. API Usage
Endpoint 1: OpenAI-Compatible Endpoint (/v1/audio/speech
)
This endpoint does not require a reference audio and can be called directly using the OpenAI SDK.
Python Example (openai
SDK):
from openai import OpenAI
import os
# Point the client to our local service
client = OpenAI(
base_url="http://127.0.0.1:5093/v1",
api_key="not-needed" # The API key is not required, but the SDK needs a value
)
response = client.audio.speech.create(
model="chatterbox-tts", # This parameter is ignored
voice="en", # Used to pass the language code, currently only 'en' is supported
speed=0.5, # Corresponds to the cfg_weight parameter
input="Hello, this is a test from the OpenAI compatible API.",
instructions="0.5", # (Optional) Corresponds to the exaggeration parameter, note it must be a string
response_format="mp3" # Optional: 'mp3' or 'wav'
)
# Stream the audio to a file
response.stream_to_file("output_api1.mp3")
print("Audio saved to output_api1.mp3")
Endpoint 2: Voice Cloning Endpoint (/v2/audio/speech_with_prompt
)
This endpoint requires uploading both the text and a reference audio file via multipart/form-data
.
Python Example (requests
library):
import requests
API_URL = "http://127.0.0.1:5093/v2/audio/speech_with_prompt"
REFERENCE_AUDIO = "path/to/your/reference.mp3" # Replace with the path to your reference audio
form_data = {
'input': 'This voice should sound like the reference audio.',
'cfg_weight': '0.5',
'exaggeration': '0.5',
'response_format': 'mp3' # Optional: 'mp3' or 'wav'
}
with open(REFERENCE_AUDIO, 'rb') as audio_file:
files = {'audio_prompt': audio_file}
response = requests.post(API_URL, data=form_data, files=files)
if response.ok:
with open("output_api2.mp3", "wb") as f:
f.write(response.content)
print("Cloned audio saved to output_api2.mp3")
else:
print("Request failed:", response.text)