Skip to content

How to Use Chinese Sentence Re-segmentation

Whisper is a popular speech recognition model, but its performance with Chinese is noticeably weaker. Compared to English speech recognition, Whisper's Chinese recognition lags behind, often outputting traditional characters and lacking punctuation, leading to poor sentence segmentation in subtitles. Even with character-level timestamps for re-segmentation, the results are often unsatisfactory if the audio/video lacks clear silent pauses.

In contrast, Alibaba's FunASR series excels in Chinese recognition, but it only supports Chinese, limiting its versatility.

Therefore, v2.92 introduces Alibaba's Chinese punctuation restoration model. This model can restore punctuation marks in Chinese recognition results and re-segment sentences based on punctuation and silent intervals. The addition of this model increases the software size by approximately 400MB.

Enabling Chinese Sentence Re-segmentation

The Alibaba Chinese punctuation model will automatically re-segment the results when the following conditions are met:

  1. The "Chinese Re-segmentation" option is checked on the main interface or the audio/video to subtitle interface;
  2. The audio/video's spoken language is Chinese;
  3. The speech recognition engine is set to "faster-whisper", "openai-whisper", or "deepgram.com";
  4. The splitting mode is set to "Whole file".

After meeting these conditions, the system will first restore punctuation marks after speech recognition is complete. Then, it will re-segment sentences based on punctuation marks and silent intervals to improve the accuracy and readability of the subtitles.