Skip to content

v3.64 includes some minor optimizations, mainly focusing on segmentation during speech recognition and reducing dubbing errors.

Adjust Subtitle Duration During Speech Recognition

The principle of speech recognition is to divide the entire audio into small segments based on silent intervals. Each segment may be 1 second, 5 seconds, 10 seconds, or 20 seconds long, etc. These small segments are then transcribed into text and combined into subtitles.

When using faster-whisper mode or GeminiAI as the speech recognition channel, the subtitle recognition results may be too long (a long string of text) or too fragmented. At this time, you can adjust the segmentation parameters according to the speech characteristics. This mainly involves the following parameters:

Find the Menu → Tools/Options → Advanced Options → faster/openai Speech Recognition Adjustment interface, as shown below:

  1. Silence Separation Milliseconds (Note: unit is milliseconds): This is the basis for audio segmentation. Only when the duration of a silent interval reaches or exceeds the set value will segmentation occur at this point. For example, setting it to 200 means that segmentation will only occur when the silent interval reaches or exceeds 200 milliseconds. If the speech is fast and the pauses are short, this value can be lowered; conversely, if the speech is slow, it can be increased appropriately.
  2. Minimum Speech Duration/Milliseconds (Note: unit is milliseconds): Only segments exceeding this set duration will be segmented into a subtitle. For example, setting it to 1000ms means that the shortest subtitle will not be less than 1000 milliseconds, avoiding overly fragmented subtitles.
  3. Maximum Speech Duration/Seconds (Note: unit is seconds): Opposite to the previous item, used to limit the maximum duration of subtitles. For example, setting it to 15 means that if the segment duration reaches 15 seconds and a suitable segmentation point has not been found, it will be forcibly segmented.
  4. Maximum Subtitle Duration in Seconds: This parameter is used to re-sentence after recognition is complete to limit the length of the subtitles, and is not related to the segmentation during speech recognition.

Edge-TTS Reduces 403 Error Rate (Also Applicable to Other Dubbing Channels)

Since dubbing requires connecting to Microsoft's API, 403 errors cannot be completely avoided. However, errors can be reduced by adjusting the following:

Find Menu → Tools/Options → Advanced Options → Dubbing Adjustment as shown below:

  1. Number of Subtitles Dubbed Simultaneously: It is recommended to set this to 1. Reducing the number of subtitles dubbed simultaneously can reduce errors caused by excessively high request frequency. This setting also applies to other dubbing channels.
  2. Pause Time After Dubbing/Seconds: For example, setting it to 5 means pausing for 5 seconds after completing the dubbing of one subtitle before proceeding to the next. It is recommended to set this value to 5 or higher to reduce the error rate by extending the request interval.

This is an open-source and free video translation, speech transcription, text-to-speech, and subtitle translation software. Open-source address: https://github.com/jianchang512/pyvideotrans Documentation site: https://pvt9.com The software itself has no charges or revenue and is maintained by interest. If it is helpful to you, welcome to donate and support: https://pvt9.com/about