Skip to content

Why Audio, Subtitles, and Video Are Out of Sync

When translating between different languages, sentence lengths will change. The duration of pronunciation will generally also change. For example, when translating from Chinese to English, the length of the sentences will definitely be different, and the time it takes to pronounce the Chinese sentence and the English sentence will generally be different as well.

Chinese: 有多远滚多远 (Get as far away as possible)

English: Get out of here as far as you can!

Chinese: 滚远点 (Go away)

Japanese: ここから出て行け (Koko kara deteike - Get out of here.)

If the original video's Chinese pronunciation takes 2 seconds, and after translating and dubbing into English, the duration becomes 4 seconds, this will inevitably lead to synchronization issues.

How to Synchronize Them (Regardless of Quality, Just Synchronize)

As mentioned above, if the duration before translation is 2 seconds and the duration after translation is 4 seconds, if you only need them to be synchronized, regardless of speech speed or video speed, you can directly speed up the audio by 2 times. The 4-second duration will be shortened to 2 seconds, naturally achieving synchronization. Alternatively, you can slow down the video, extending the original 2-second clip to 4 seconds, which can also achieve alignment.

Specific Steps for Audio Acceleration to Achieve Alignment:

  1. In the software interface, select "Automatic Audio Acceleration" and uncheck "Automatic Video Slowdown." image-20240902003425516
  2. Open Menu Tool-Options, set the maximum audio acceleration multiple to 100.

This will achieve synchronization, but the drawbacks are obvious: the speech speed fluctuates.

Steps for Video Slowdown to Achieve Alignment:

  1. Uncheck "Automatic Audio Acceleration" in the software interface and select "Automatic Video Slowdown."

    image-20240902003436797

  2. Open Menu Tool-Options, set the maximum video slowdown multiple to 20.

This can also achieve alignment. The speech speed remains constant, and the video slows down. But similarly, the video becomes inconsistent in speed.

If you just want simple alignment and don't care about the quality, you can use these two methods.

Better, More Acceptable Synchronization Methods

Obviously, the above synchronization methods are not practical. Audio that is too fast or video that is too slow are unacceptable, and the experience is terrible. For better results, you can enable both "Automatic Audio Acceleration" and "Automatic Video Slowdown."

Specific Steps:

  1. When selecting faster mode or openai mode, try to use medium or larger models and select "Overall Recognition." image-20240902004236786

  2. In the software interface, select "Automatic Audio Acceleration" and "Automatic Video Slowdown," and set a small overall acceleration value, such as 10%.

image-20240902003457505

  1. Open Menu Tool-Options, set the maximum audio acceleration multiple to 1.8, which means the maximum speech speed is accelerated to 1.8 times the normal speed. You can manually change it to a value greater than 1, such as 2 or 1.5. image-20240902003537160
  2. Open Menu Tool-Options, set the maximum video slowdown multiple to 2, which means slowing down the video to 0.05 times the normal speed. You can change it to a value greater than 1, such as 3 or 5.
  3. After the above steps 1-3, it may still not be aligned because you have limited the maximum value. When the maximum value is reached and it is still not aligned, it will be abandoned and directly delayed. Then, you can continue to adjust the screen subtitle-related options in Menu-Tool-Options.

Is There a Perfect Synchronization Method?

Apart from manual intervention, such as simplifying translations or adding transition shots, no perfect method has yet been found that can be automated by a program.

To simultaneously ensure that "audio acceleration range is acceptable," "video slowdown range is acceptable," and "the moment of mouth opening and closing matches the start time of the speech" can be achieved automatically by a program in any language translation and dubbing in very long or very short videos, seems to be an impossible task. Apart from manual adjustment, there is no perfect method.