Skip to content

A free web service based on the OpenAI Whisper model that transcribes speech to text. You can use it directly in your browser, no registration or login required.

The model is downloaded and run locally, ensuring that your files are not uploaded to any external server.

Access Here

https://stt.pyvideotrans.com


Available Models

The tool offers several model options, including:

  • tiny
  • base
  • small
  • medium
  • large-v1
  • large-v3

Model Characteristics:

  • Smaller models (like tiny and base) run faster but have lower transcription accuracy.
  • Larger models (like large-v1 and large-v3) have higher accuracy but run slower and may cause browser crashes on lower-performance devices.

How to Use

  1. Upload File: Click to select the audio or video file you want to transcribe.
  2. Select Model: Choose a suitable model based on your device's performance.
    • Weaker devices are recommended to use tiny or base.
    • Stronger devices can choose small or medium.
    • Avoid selecting excessively large models unless your device has excellent performance, to prevent browser crashes.
  3. Select Language: Specify the language of the audio or video.
  4. Model Download: The first time you use a model, the tool will download the model files from Hugging Face. Since this website may not be directly accessible in some regions, it is recommended to use a VPN to ensure a smooth download.

Important Notes

  • Privacy & Security: After downloading, the model runs entirely locally. Your files will not be uploaded to any server.
  • Performance Dependency: Model selection and running speed depend on your device's performance.
  • System Recommendations: We recommend using the Chrome browser on Windows or Linux systems. M-series chip support on Mac devices may not be fully optimized.

Technical Principles

  • Implementation: The tool is based on Transformers.js technology, which supports running large models in the browser.
  • Model Source: Employs the OpenAI Whisper model, optimized and converted by Xenova/whisper-web.