Skip to content

Why Audio, Subtitles, and Video Become Out of Sync

When translating between different languages, the length of sentences changes, and the duration of pronunciation generally changes as well. For example, when translating from Chinese to English, the sentence length will definitely be different, and the time it takes to pronounce the Chinese sentence and the English sentence will generally be different.

Chinese: 有多远滚多远 (Get as far away as possible)

English: Get out of here as far as you can!

Chinese: 滚远点 (Go away)

Japanese: ここから出て行け (Koko kara dete ike - Get out of here)

If the original video's Chinese pronunciation takes 2 seconds, and it's translated and dubbed into English, the duration might be 4 seconds. This will inevitably lead to asynchronization.

How to Synchronize Them, Regardless of Quality, Just Synchronize

As mentioned above, if the duration before translation is 2 seconds and the duration after translation is 4 seconds, and you only need them to be synchronized, regardless of speech speed or video speed, you can directly speed up the audio by 2x. The 4-second duration will be shortened to 2 seconds, naturally achieving synchronization. Alternatively, you can slow down the video, extending the original 2-second segment to 4 seconds, which can also achieve alignment.

Specific Steps for Audio Speed-Up to Achieve Alignment:

  1. In the software interface, select "Automatic Audio Speed-Up" and deselect "Automatic Video Slow-Down." image-20240902003425516
  2. Open the menu Tools - Options, and set the maximum audio speed-up multiple to 100.

This will achieve synchronization, but the drawback is obvious: the speech speed becomes erratic.

Specific Steps for Video Slow-Down to Achieve Alignment:

  1. Deselect "Automatic Audio Speed-Up" in the software interface and select "Automatic Video Slow-Down."

    image-20240902003436797

  2. Open the menu Tools - Options, and set the maximum video slow-down multiple to 20.

This can also achieve alignment. The speech speed remains unchanged, and the video slows down, but similarly, the video becomes erratic.

If you only want simple alignment and don't care about the quality, you can use these two methods.

A Better, More Acceptable Synchronization Method

Obviously, the above synchronization methods are not practical. Audio that is too fast or video that is too slow is unacceptable, and the experience is terrible. For a better effect, you can enable both "Automatic Audio Speed-Up" and "Automatic Video Slow-Down."

Specific Steps:

  1. When selecting faster mode or openai mode, try to use medium or larger models and select "Overall Recognition". image-20240902004236786

  2. In the software interface, select "Automatic Audio Speed-Up" and "Automatic Video Slow-Down" simultaneously, and set a small overall speed-up value, such as 10%.

image-20240902003457505

  1. Open the menu Tools - Options, and set the maximum audio speed-up multiple to 1.8, which means the maximum speech speed is accelerated to 1.8 times normal. You can manually modify it to 2 or 1.5, etc., a value greater than 1. image-20240902003537160
  2. Open the menu Tools - Options, and set the video maximum slow-down multiple to 2, which means slowing down to 0.05 times normal. You can change it to 3 or 5, etc., a value greater than 1.
  3. After the above steps 1-3, it may still not be aligned because the maximum value is limited. When the maximum value is reached and it is still not aligned, it will be abandoned and directly delayed. Then, you can continue to adjust the subtitle-related options in the menu - Tools - Options.

Is There a Perfect Synchronization Method?

Apart from manual intervention, such as simplifying the translation or adding transition screens, a perfect method that can be automatically implemented by a program has not yet been found.

To simultaneously ensure that in very long or very short videos, in any language translation and dubbing, the goals of "acceptable audio speed-up range," "acceptable video slow-down range," and "mouth opening and closing moments match speech starting moments" can be achieved automatically by a program, it seems to be an impossible task. Apart from manual adjustments, there is no perfect method.