Skip to content

A free web service based on the OpenAI Whisper model that transcribes speech to text. You can use it simply by opening your browser, without registration or login.

The model is downloaded and run locally, ensuring that your files are not uploaded to any external servers.

Usage Address

https://stt.pyvideotrans.com


Available Models

The tool offers several model options, including:

  • tiny
  • base
  • small
  • medium
  • large-v1
  • large-v3

Model Characteristics:

  • Smaller models (like tiny and base) run faster but have lower transcription accuracy.
  • Larger models (like large-v1 and large-v3) have higher accuracy but run slower and may cause browser crashes on less powerful devices.

How to Use

  1. Upload File: Click to select the audio or video file you want to transcribe.
  2. Select Model: Choose the appropriate model based on your device's performance.
    • Weaker devices should use tiny or base.
    • More powerful devices can choose small or medium.
    • Avoid selecting overly large models unless your device has excellent performance to prevent browser crashes.
  3. Select Language: Specify the language of the speech in the audio or video.
  4. Model Download: The first time you use a model, the tool will download the model file from Hugging Face. Since this website may not be directly accessible in some regions, it is recommended to use a VPN to ensure a smooth download.

Precautions

  • Privacy and Security: Once the model is downloaded, it runs entirely locally, and your files are not uploaded to any server.
  • Performance Dependence: Model selection and running speed depend on your device's performance.
  • System Recommendations: It is recommended to use the Chrome browser on Windows or Linux systems. M-series chip support on Mac devices may not be fully optimized.

Technical Principles

  • Implementation: The tool is based on Transformers.js technology, which supports running large models in the browser.
  • Model Source: It uses the OpenAI Whisper model, optimized and converted by Xenova/whisper-web.