There are 14 models in total, which can be divided into 3 categories, all of which are used to recognize human speech in videos as subtitle text.
To reduce the download size, the software only includes the smallest size tiny model by default. This model has the lowest recognition accuracy. If you need better results, please download other larger models.
Models that can be used in both openai and faster modes
tiny: The smallest model, the fastest speed, consumes the least resources, and has the lowest accuracy.
tiny.en: Only for videos with English pronunciation.
base:
base.en: Only for videos with English pronunciation.
small,
small.en: Only for videos with English pronunciation.
medium
medium.en: Only for videos with English pronunciation.
large-v1
large-v2
large-v3: The largest model, with the highest accuracy, requires 8G or 12G or more of available video memory.
Models only for faster mode
distil-whisper-small.en: Only for English videos
distil-whisper-medium.en: Only for English videos
distil-whisper-large-v2: Requires 8G or more of video memory. Currently, it has good results for English videos, but poor results for other languages.
The first category is models with the suffix .en
For example, tiny.en, base.en, medium.en, etc. As the name suggests, this type of model is only used for video processing where the original language is English. That is to say, if the spoken language in the video you want to process is English, then choosing a model with the suffix .en will have better results than an equivalent model without .en.
The second category is models without .en
Can be used for all supported languages, such as tiny large-v1, etc.
The third category is models starting with distil
There are currently only three models in this category, and they can only process videos where the original language is English. Even without the .en suffix, it is recommended to only use them for processing videos with English pronunciation. The effect of processing videos in other languages will be very poor.
The characteristic of this type of model is that it is faster. Note that distil models can only be used in faster mode and cannot be used in openai mode.
distil-whisper-small.en
distil-whisper-medium.en
distil-whisper-large-v2
faster Model Download
All models are downloaded from this address: https://github.com/jianchang512/stt/releases/tag/0.0
After opening, select according to the mode you want to use. It is recommended to choose the faster model for faster speed.
After the faster model is downloaded, the compressed package is a folder. Copy the folder inside to the models folder in the software directory.
For example, after the medium model is downloaded, opening the compressed package will show the folder
Copy this folder to the models directory
As shown above.
openai Model Download
Also this address https://github.com/jianchang512/stt/releases/tag/0.0
Scroll down and download to get a file with the .pt suffix, and copy the file directly to the models directory.