Using the Chinese Repunctuation Feature
Whisper is currently the dominant speech recognition model, but its Chinese recognition capabilities still have significant shortcomings. Compared to English speech recognition, Whisper's Chinese recognition performance has a larger gap. It not only frequently outputs traditional Chinese characters, but also lacks punctuation, resulting in poor sentence segmentation in the generated subtitles. Even when repunctuating using the returned character-level timestamps, the results are still unsatisfactory if the audio and video lack clear silent intervals for segmentation.
In comparison, Alibaba's FunASR series models excel in Chinese recognition, but their language support is limited to Chinese and cannot handle other languages.
Therefore, in v2.92, Alibaba's Chinese punctuation restoration model has been introduced. This model can restore punctuation in Chinese recognition results and re-segment sentences based on punctuation and silent intervals. Due to the addition of the punctuation restoration model, the software size has increased by approximately 400MB.
Enabling Chinese Repunctuation
The Alibaba Chinese punctuation model will automatically be used for repunctuation when the following conditions are met:
- The "Chinese Repunctuation" option is selected on the main interface or the audio and video to subtitle interface.
- The pronunciation language of the audio and video is Chinese.
- The speech recognition engine is selected as "faster-whisper", "openai-whisper", or "deepgram.com".
- The segmentation mode is selected as Overall Recognition.
After the above conditions are met, the system will first restore punctuation after speech recognition, and then re-segment sentences based on punctuation and silent intervals to improve the accuracy and readability of the subtitles.