Parakeet-API: High-Performance Local Speech-to-Text Service

The parakeet-api project is a local speech-to-text service based on the NVIDIA Parakeet-tdt-0.6b model. It provides an OpenAI API-compatible interface and a clean web UI, allowing you to easily and quickly convert any audio/video file into high-precision SRT subtitles. It is also compatible with pyVideoTrans v3.72+.

Open-source project repository: https://github.com/jianchang512/parakeet-api

✨ Core Advantages of Parakeet-API

🚀 Ultimate Speed and Performance: The Parakeet model is highly optimized. Especially with an NVIDIA GPU, its transcription speed is extremely fast, making it ideal for processing large volumes or long-duration audio/video files.
🎯 Precise Timestamps: Using advanced Transducer (TDT) technology, it generates highly accurate SRT timestamps that align perfectly with the audio stream, making it perfect for video subtitling.
💰 Completely Free, Unlimited Use: Runs on your own hardware, so there are no API fees or usage time limits.
🌐 Flexible Access Methods: Provides an intuitive web interface and a standardized API, allowing for easy integration into existing workflows like pyVideoTrans.

🛠️ Installation and Configuration Guide

This project supports Windows, macOS, and Linux. Please follow the steps below for installation and configuration.

Step 0: Set up Python 3.10 Environment

If you don't have Python 3 on your machine, please follow this tutorial to install it: https://pvt9.com/_posts/pythoninstall

Step 1: Prepare FFmpeg

This project uses ffmpeg for audio/video format preprocessing.

Windows (Recommended):
1. Download from the FFmpeg GitHub repository and extract ffmpeg.exe.
2. Place the downloaded ffmpeg.exe file directly in the project's root directory (in the same folder as app.py). The program will automatically detect and use it, so there's no need to configure environment variables.
macOS (via Homebrew):
bash
```
brew install ffmpeg
```
1

Linux (Debian/Ubuntu):

bash

sudo apt update && sudo apt install ffmpeg

Step 2: Create a Python Virtual Environment and Install Dependencies

Download or clone this project's code to your local computer (it's recommended to place it in a non-system drive folder with a name containing only English letters or numbers).
Open a terminal or command prompt and navigate to the project's root directory (on Windows, you can simply type cmd in the address bar and press Enter).
Create a virtual environment: python -m venv venv
Activate the virtual environment:
- Windows (CMD/PowerShell): .\venv\Scripts\activate
- macOS / Linux (Bash/Zsh): source venv/bin/activate
Install dependencies:
- If you don't have an NVIDIA GPU (CPU only):
  bash
```
pip install -r requirements.txt
```
  1
- If you have an NVIDIA GPU (for GPU acceleration): a. Ensure you have the latest NVIDIA drivers and the corresponding CUDA Toolkit installed. b. Uninstall any existing old version of PyTorch: pip uninstall -y torch c. Install the PyTorch version that matches your CUDA version (e.g., for CUDA 12.6):
  bash
```
pip install torch --index-url https://download.pytorch.org/whl/cu126
```
  1
  d. Finally, install the rest of the project's dependencies: pip install -r requirements.txt

Step 3: Start the Service

In the terminal with the activated virtual environment, run the following command:

bash

python app.py

You will see prompts indicating the service is starting. The first run will download the model (approx. 1.2GB), so please be patient.

Don't be alarmed if you see a lot of messages.

Successful Launch Screen

🚀 Usage

Method 1: Using the Web Interface

Open in your browser: http://127.0.0.1:5092
Drag and drop or click to upload your audio/video file.
Click "Start Transcription" and wait for the process to complete. You can then view and download the SRT subtitles below.

Method 2: API Call (Python Example)

You can easily call this service using the openai library.

python

from openai import OpenAI

client = OpenAI(
    base_url="http://127.0.0.1:5092/v1",
    api_key="any-key",
)

with open("your_audio.mp3", "rb") as audio_file:
    srt_result = client.audio.transcriptions.create(
        model="parakeet",
        file=audio_file,
        response_format="srt"
    )
print(srt_result)

Method 3: Integration with pyVideoTrans (Recommended)

Parakeet-API can be seamlessly integrated with the video translation tool pyVideoTrans (v3.72 and above).

Ensure your parakeet-api service is running locally.
Open the pyVideoTrans software.
In the menu bar, select Speech Recognition(R) -> Nvidia parakeet-tdt.
In the configuration window that appears, set the "http address" to: http://127.0.0.1:5092/v1
Click "Save" to start using it.

Parakeet-API: High-Performance Local Speech-to-Text Service ​

✨ Core Advantages of Parakeet-API ​

🛠️ Installation and Configuration Guide ​

Step 0: Set up Python 3.10 Environment ​

Step 1: Prepare FFmpeg ​

Step 2: Create a Python Virtual Environment and Install Dependencies ​

Step 3: Start the Service ​

🚀 Usage ​

Method 1: Using the Web Interface ​

Method 2: API Call (Python Example) ​

Method 3: Integration with pyVideoTrans (Recommended) ​