Skip to content

Differences between Whole Recognition and Equal Division

Whole Recognition:

This provides the best speech recognition results but consumes the most computer resources. If the video is large and uses the large-v3 model, it may cause a crash.

During recognition, the entire audio file is passed to the model. The model internally uses VAD to segment and recognize speech and create sentence breaks. The default silence separation is 200ms, and the maximum sentence length is 3s. These settings can be configured in Menu -- Tools/Options -- Advanced Options -- VAD area.

Equal Division:

As the name suggests, this cuts the audio file into segments of equal length and passes them to the model. Also, Equal Division will be forced under the OpenAI model, that is, when using the OpenAI model, whether you choose "Whole Recognition" or "Pre-segmentation", "Equal Division" will be forced.

Equal division segments are each 10s long, and the silent segment sentence interval is 500ms. These settings can be configured in Menu -- Tools/Options -- Advanced Options -- VAD area.

Note: Although set to 10s, each subtitle is roughly 10s in duration, but not every voice-over length is necessarily 10s, the duration of the pronunciation and the silence at the end of the voice-over will be removed.