Skip to content

How to Use the Chinese Re-segmentation Feature

Whisper is a leading speech recognition model, but it still has notable shortcomings in Chinese recognition. Compared to English speech recognition, Whisper's Chinese recognition performance is significantly weaker. It often outputs Traditional Chinese characters and lacks punctuation, leading to poor sentence segmentation in the generated subtitles. Even with character-level timestamps for re-segmentation, the results are often unsatisfactory if the audio/video lacks clear silent segments.

In contrast, Alibaba's FunASR series models excel in Chinese recognition, but they have limited language support, only working for Chinese and not other languages.

Therefore, in v2.92, we introduced Alibaba's Chinese punctuation restoration model. This model restores punctuation marks in Chinese recognition results and re-segments sentences based on punctuation and silent intervals. The addition of this punctuation restoration model increases the software size by approximately 400MB.

Enabling Chinese Re-segmentation

The Alibaba Chinese punctuation model will automatically be used to re-segment results when the following conditions are met:

  1. The "Chinese Re-segmentation" option is checked on the main interface or the audio/video to subtitles interface;
  2. The spoken language of the audio/video is Chinese;
  3. The speech recognition engine is selected as "faster-whisper", "openai-whisper", or "deepgram.com";
  4. The segmentation mode is set to Recognize as a whole.

Once the above conditions are met, the system will first restore punctuation after speech recognition is complete, and then re-segment sentences based on punctuation marks and silent intervals to improve the accuracy and readability of subtitles.