Skip to content

Difference Between Whole Recognition and Equal Partition

Whole Recognition:

This method offers the best speech recognition results but consumes the most computer resources. If the video is large and you're using the large-v3 model, it might lead to crashes.

During recognition, the entire audio file is passed to the model, which internally uses VAD (Voice Activity Detection) to segment and punctuate the speech. The default silence split is 200ms, and the maximum sentence length is 3 seconds. You can configure these settings in Menu -- Tools/Options -- Advanced Options -- VAD area.

Equal Partition:

As the name suggests, this method cuts the audio file into segments of equal length and then passes them to the model. Also, the OpenAI model will force using equal partition. That means no matter you choose "Whole Recognition" or "Pre-Segmentation", "Equal Partition" will be enforced when you are using OpenAI model.

Each segment in equal partition is 10 seconds long, and the silence split statement interval is 500ms. You can configure these settings in Menu -- Tools/Options -- Advanced Options -- VAD area.

Note: Although you set the segment length to 10 seconds, each subtitle will generally be around 10 seconds, but not every voice-over will be exactly 10 seconds long due to the duration of the pronunciation and the removal of silence at the end of the voice-over.