Skip to content

Why Sound, Subtitles, and Video are Out of Sync

After translating between different languages, the length of sentences changes, and the duration of pronunciation generally also changes. For example, when translating from Chinese to English, the length of the sentences is definitely different, and the duration of pronouncing these Chinese sentences and pronouncing these English sentences is generally also different.

Chinese: 有多远滚多远 (Gǔn duō yuǎn gǔn duō yuǎn) - Roll as far as you can English: Get out of here as far as you can!

Chinese: 滚远点 (Gǔn yuǎn diǎn) - Get lost Japanese: ここから出て行け。 (Koko kara deteike) - Get out of here.

If the original video in Chinese takes 2 seconds to pronounce, translating it into English and dubbing it may take 4 seconds, which will inevitably lead to asynchronization.

How to Synchronize Them - Not Caring About the Effect, Just Want Synchronization

As mentioned above, if the duration before translation is 2 seconds and the duration after translation is 4 seconds, if you only need them to be synchronized, regardless of speech speed or video speed, you can directly speed up the audio by 2 times, and the 4-second duration can be shortened to 2 seconds, naturally achieving synchronized alignment. Or slow down the video to extend the original 2-second segment to 4 seconds, which can also achieve alignment.

Specific Operation Method to Achieve Alignment by Audio Acceleration:

  1. Select "Automatic Audio Acceleration" in the software interface, and uncheck "Automatic Video Slowdown". image-20240902003425516
  2. Open Menu Tools - Options, and set the maximum audio acceleration multiple to 100.

This can achieve synchronization, but the disadvantages are obvious: the speech speed is suddenly fast and suddenly slow.

Video Slowdown Operation to Achieve Alignment:

  1. Uncheck "Automatic Audio Acceleration" in the software interface, and select Automatic Video Slowdown.

    image-20240902003436797

  2. Open Menu Tools - Options, and set the maximum video slowdown multiple to 20.

This can also achieve alignment, keeping the speech speed unchanged while slowing down the video, but the video becomes suddenly fast and suddenly slow.

If you just want simple alignment, without caring about the effect, you can take these two methods.

Better Acceptable Synchronization Method

Obviously, the above synchronization methods are not practical, and audio that is too fast or video that is too slow is unacceptable, and the experience is too poor. For better results, you can simultaneously enable "Automatic Audio Acceleration" and "Automatic Video Slowdown".

Specific Operation:

  1. When selecting faster mode or openai mode, try to use medium or larger models, and select "Overall Recognition". image-20240902004236786

  2. Select "Automatic Audio Acceleration" and "Automatic Video Slowdown" in the software interface, and at the same time set a smaller overall acceleration value, such as 10%.

    image-20240902003457505

  3. Open Menu Tools - Options, and set the maximum audio acceleration multiple to 1.8, that is, the maximum speech speed is accelerated to 1.8 times the normal speed. You can manually change it to 2 or 1.5 or other values greater than 1. image-20240902003537160

  4. Open Menu Tools - Options, and set the maximum video slowdown multiple to 2, that is, slow down to 0.05 times the normal speed. You can change it to 3 or 5 or other values greater than 1.

  5. Even after the above 1-3 operations, it may still not be aligned, because the maximum value is limited. When the maximum value is reached and it is not yet aligned, it will give up and directly delay, then you can continue to adjust the picture subtitle-related options in the Menu-Tools-Options.

Is There a Perfect Synchronization Method?

Apart from human participation in manual processing, such as streamlining translations and adding transition screens, no perfect method that can be implemented automatically by a program has been found so far.

To simultaneously ensure that in videos of any length, in any language translation and dubbing, the goals of "acceptable audio acceleration range", "acceptable video slowdown range", and "mouth opening and closing moments match the voice start moments" can be achieved automatically by a program, it seems like an impossible task. Apart from manual participation in adjustments, there is no perfect method.