Skip to content

Why Audio, Subtitles, and Video Can Become Out of Sync

When translating between different languages, the length of sentences and the duration of pronunciation often change. For example, when translating from Chinese to English, the sentence length and the time it takes to speak each sentence are generally different.

Chinese: 有多远滚多远 (yǒu duō yuǎn gǔn duō yuǎn) (Get as far away as you can)

English: Get out of here as far as you can!

Chinese: 滚远点 (gǔn yuǎn diǎn) (Go away)

Japanese: ここから出て行け (Koko kara dete ike).

If the original video's Chinese pronunciation takes 2 seconds, and the translated English dubbing takes 4 seconds, this will inevitably lead to desynchronization.

How to Synchronize Them (Regardless of Quality, Just Synchronize)

As mentioned above, if the duration before translation is 2 seconds and the duration after translation is 4 seconds, if you only need them to be synchronized, regardless of speech speed or video speed, you can directly speed up the audio by 2x. The 4-second duration will then be shortened to 2 seconds, naturally achieving synchronization. Alternatively, you can slow down the video, extending the original 2-second clip to 4 seconds, which can also achieve alignment.

Specific Steps for Audio Speed Up to Achieve Alignment:

  1. In the software interface, select "Automatic Audio Speed Up" and deselect "Automatic Video Slow Down." image-20240902003425516
  2. Open the menu Tools - Options and set the maximum audio speed up multiplier to 100.

This will achieve synchronization, but the drawbacks are obvious: the speech speed will be inconsistent.

Steps for Video Slow Down to Achieve Alignment:

  1. Deselect "Automatic Audio Speed Up" in the software interface and select "Automatic Video Slow Down."

    image-20240902003436797

  2. Open the menu Tools - Options and set the maximum video slow down multiplier to 20.

This can also achieve alignment, maintaining a constant speech speed while slowing down the video, but the video will also become inconsistent.

If you only want simple alignment and don't care about the quality, you can use these two methods.

A Better, More Acceptable Synchronization Method

Clearly, the above synchronization methods are not practical, as excessively fast audio or excessively slow video are unacceptable and provide a poor user experience. For better results, you can enable both "Automatic Audio Speed Up" and "Automatic Video Slow Down" simultaneously.

Specific Steps:

  1. When selecting Faster mode or OpenAI mode, try to use a medium or larger model and select "Overall Recognition." image-20240902004236786

  2. In the software interface, select both "Automatic Audio Speed Up" and "Automatic Video Slow Down," and set a small overall acceleration value, such as 10%.

image-20240902003457505

  1. Open the menu Tools - Options and set the maximum audio speed up multiplier to 1.8, meaning the maximum speech speed will be 1.8 times the normal speed. You can manually change this to 2 or 1.5, etc., as long as it's a value greater than 1. image-20240902003537160
  2. Open the menu Tools - Options and set the maximum video slow down multiplier to 2, meaning slow down to 0.5 times the normal speed. You can change this to 3 or 5, etc., as long as it's a value greater than 1.
  3. After the above steps 1-3, synchronization may still not be achieved because the maximum values are limited. When the maximum value is reached and synchronization is still not achieved, the process will be abandoned and directly postponed. In this case, you can continue to adjust the subtitle-related options in the menu Tools - Options.

Is There a Perfect Synchronization Method?

Apart from manual intervention, such as simplifying translations or adding transition scenes, a perfect method that can be automatically implemented by programs has not yet been found.

Simultaneously ensuring that "acceptable audio speed up range," "acceptable video slow down range," and "mouth movements match the beginning of speech" are achieved in very long or very short videos, and in any language translation and dubbing, using programmatic automation, seems to be an impossible task. Apart from manual adjustments, there is no perfect method.