A free web service based on the OpenAI Whisper model that transcribes speech to text. You can use it directly in your browser, no registration or login required.
The model is downloaded and run locally, ensuring that your files are not uploaded to any external server.
Access Here
Available Models
The tool offers several model options, including:
tiny
base
small
medium
large-v1
large-v3
Model Characteristics:
- Smaller models (like
tiny
andbase
) run faster but have lower transcription accuracy. - Larger models (like
large-v1
andlarge-v3
) have higher accuracy but run slower and may cause browser crashes on lower-performance devices.
How to Use
- Upload File: Click to select the audio or video file you want to transcribe.
- Select Model: Choose a suitable model based on your device's performance.
- Weaker devices are recommended to use
tiny
orbase
. - Stronger devices can choose
small
ormedium
. - Avoid selecting excessively large models unless your device has excellent performance, to prevent browser crashes.
- Weaker devices are recommended to use
- Select Language: Specify the language of the audio or video.
- Model Download: The first time you use a model, the tool will download the model files from Hugging Face. Since this website may not be directly accessible in some regions, it is recommended to use a VPN to ensure a smooth download.
Important Notes
- Privacy & Security: After downloading, the model runs entirely locally. Your files will not be uploaded to any server.
- Performance Dependency: Model selection and running speed depend on your device's performance.
- System Recommendations: We recommend using the Chrome browser on Windows or Linux systems. M-series chip support on Mac devices may not be fully optimized.
Technical Principles
- Implementation: The tool is based on Transformers.js technology, which supports running large models in the browser.
- Model Source: Employs the OpenAI Whisper model, optimized and converted by Xenova/whisper-web.