Skip to content

Differences between Overall Recognition and Equal Segmentation

Overall Recognition:

This has the best speech recognition effect, but it also consumes the most computer resources. If the video is relatively large and the large-v3 model is used, it may cause a crash.

During recognition, the entire audio file is passed to the model, which uses VAD internally for segmentation recognition and sentence segmentation. The default silence segmentation is 200ms and the maximum statement length is 3s. These can be configured in Menu -- Tools/Options -- Advanced Options -- VAD area.

Equal Segmentation:

As the name suggests, this cuts the audio file into equal lengths according to a fixed length, and then passes it to the model. At the same time, the OpenAI model will force the use of equal segmentation, that is, when using the OpenAI model, whether you choose "Overall Recognition" or "Pre-Segmentation", it will force the use of "Equal Segmentation".

Each segment of equal segmentation is 10s, and the silent segmentation statement interval is 500ms. These can be configured in Menu -- Tools/Options - Advanced Options -- VAD area.

Note: With 10s set, each subtitle is generally 10s long, but the duration of each dubbing is not necessarily 10s. The pronunciation duration and the silence at the end of the dubbing will be removed.