The function of Speech Recognition Channels is to identify text based on the spoken voices in a video and organize it into subtitles with precise timestamps.

Tip: Some channels, such as OpenAI and ByteDance Volcano Engine, require you to pre-set an API address or key (SK) before they can be used. Don't worry, the process is very simple! Just click the "Speech Recognition Settings" menu at the top of the software and fill in the corresponding information.

Over a Dozen Currently Supported Speech Recognition Channels
To meet the needs of different users, we provide a variety of choices covering both local offline models and online cloud services.
Click on the channel name to view detailed usage instructions for that channel.
💻 Local Offline Recognition (No Internet Required, Privacy Protected)
These channels require you to download model files to your computer the first time you use them, after which they can run completely offline.
- faster-whisper (Local Mode): A very popular local recognition solution. It is known for its fast speed and low resource consumption, while supporting recognition for dozens of languages. It is currently one of the preferred solutions for local recognition.
- openai-whisper (Local Mode): An open-source model from OpenAI. It has high recognition accuracy and supports a vast number of languages.
- Alibaba FunASR (Chinese Recognition): An open-source model launched by Alibaba DAMO Academy. It is specially optimized for Chinese scenarios, making it highly accurate in pronunciation and segmentation when recognizing Chinese.
- faster-whisper-xxl.exe: This is an extra-large model version designed specifically for Windows users, offering better recognition results. You need to download the
faster-whisper-xxl.exefile manually to use it. - whisper.c: This is a recognition channel using whisper.cpp as the backend. You need to deploy the
whipser.cppfile manually to use it. - Parakeet-tdt Speech Recognition: A recognition model open-sourced by NVIDIA. This requires you to self-host the service and then enter your API address in the software's settings menu.
- STT Speech Recognition API: Similarly, this is an open-source project that requires self-hosting. Once deployed, enter the API address into the software to use it.
☁️ Online Recognition (Cloud Processing, Powerful Features)
These channels upload audio files to cloud servers for processing. They usually offer excellent results, but some services require payment or have usage limits.
Free or Free Tier Available:
- Google Speech Recognition: A free online recognition service provided by Google. The results are decent, but using it in China requires a network proxy/VPN.
- Elevenlabs.io Speech Recognition: A service provided by a company focused on AI audio technology. You need to go to their official website to register and get a free API Key. The free version has usage limits.
- deepgram.com Speech Recognition: A well-known speech recognition provider known for high accuracy and speed. You need to register at their official website deepgram.com and apply for an API Key.
- Gemini Large Model Recognition: A powerful model launched by Google, with outstanding capabilities in recognizing minority languages. Using it requires a Gemini API KEY, and a network proxy/VPN is needed for use in China.
- Alibaba Bailian Qwen3-ASR: Based on Alibaba's "Tongyi Qianwen" large model. You need to go to the Alibaba Bailian platform to activate the service and create an API Key.
Paid or Requires API Key Application:
- 302.AI Speech Recognition: Visit the 302.ai official website to apply for an app key to use.
- ByteDance Volcano Subtitle Generation: Professional voice technology service provided by ByteDance's Volcano Engine. Its Chinese recognition effect is excellent, making it especially suitable for handling audio with accents or background noise. You need to activate the service on the Volcano Engine official website.
- OpenAI Speech Recognition: Uses the official API provided by OpenAI. The results are as excellent as the local version of Whisper, but you need to have an OpenAI API Key (SK).
🔧 Advanced Custom Options (For Developers)
If you have a certain technical background, you can also try the following more flexible solutions:
- Custom Speech Recognition API: If you have programming skills, you can write your own speech recognition API interface according to the data format standards we provide, achieving the maximum degree of customization.
