A free web service based on the OpenAI Whisper model that transcribes speech to text. You can use it simply by opening your browser, without registration or login.
The model is downloaded and run locally, ensuring that your files are not uploaded to any external servers.
Usage Address
Available Models
The tool offers several model options, including:
tiny
base
small
medium
large-v1
large-v3
Model Characteristics:
- Smaller models (like
tiny
andbase
) run faster but have lower transcription accuracy. - Larger models (like
large-v1
andlarge-v3
) have higher accuracy but run slower and may cause browser crashes on less powerful devices.
How to Use
- Upload File: Click to select the audio or video file you want to transcribe.
- Select Model: Choose the appropriate model based on your device's performance.
- Weaker devices should use
tiny
orbase
. - More powerful devices can choose
small
ormedium
. - Avoid selecting overly large models unless your device has excellent performance to prevent browser crashes.
- Weaker devices should use
- Select Language: Specify the language of the speech in the audio or video.
- Model Download: The first time you use a model, the tool will download the model file from Hugging Face. Since this website may not be directly accessible in some regions, it is recommended to use a VPN to ensure a smooth download.
Precautions
- Privacy and Security: Once the model is downloaded, it runs entirely locally, and your files are not uploaded to any server.
- Performance Dependence: Model selection and running speed depend on your device's performance.
- System Recommendations: It is recommended to use the Chrome browser on Windows or Linux systems. M-series chip support on Mac devices may not be fully optimized.
Technical Principles
- Implementation: The tool is based on Transformers.js technology, which supports running large models in the browser.
- Model Source: It uses the OpenAI Whisper model, optimized and converted by Xenova/whisper-web.