Skip to content

Dubbing, subtitle, and screen synchronization alignment in video translation has always been a technical challenge. This is because the grammatical structures and speaking speeds of different languages vary greatly. When the same sentence is translated into other languages, the number of characters and speaking speed will change, resulting in the duration of the translated dubbing being inconsistent with the original speech duration, which causes subtitles to be out of sync with the voice-over and the visuals.

Specifically, the original video shows the character finishing speaking, but the dubbing is only halfway through; or the next sentence in the original video has already started, but the dubbing is still saying the previous sentence, etc.

Translation Leads to Changes in the Number of Characters

For example, when the following Chinese sentences are translated into English, their length and number of syllables change significantly, and the corresponding speech duration also changes:

  • Chinese: 得国最正莫过于明

  • English: There is no country more upright than the Ming Dynasty

  • Chinese: 我一生都在研究宇宙

  • English: I have been studying the universe all my life

  • Chinese: 北京圆明园四只黑天鹅疑被流浪狗咬死

  • English: Four black swans in Beijing's Yuanmingyuan Garden suspected of being bitten to death by stray dogs

As can be seen, after translating Chinese subtitles into English subtitles and dubbing, the dubbing duration usually exceeds the original Chinese speech duration. To solve this problem, the following strategies are usually adopted:

Several Coping Strategies

  1. Increase Dubbing Speed: In theory, as long as the upper limit of the speaking speed is not restricted, it is always possible to match the voice duration with the subtitle duration. For example, if the original speech duration is 1 second and the dubbing duration is 3 seconds, increasing the dubbing speed to 300% will synchronize the two. However, this method makes the voice sound hurried and unnatural, and the overall effect is unsatisfactory due to varying speed.

  2. Simplify Translation: Reduce dubbing duration by shortening the translation. For example, translate "我一生都在研究宇宙" into the more concise "Cosmology is my life's work". Although this method has the best effect, it requires modifying the subtitles sentence by sentence, which is very inefficient.

  3. Adjust Silence Between Subtitles: If there is silence between two subtitles in the original speech, you can reduce or remove some of the silence to bridge the duration difference. For example, if there is 2 seconds of silence between two subtitles in the original speech, and the translated first subtitle is 1.5 seconds longer than the original subtitle, then the silence time can be reduced to 0.5 seconds, so that the dubbing time of the second subtitle is aligned with the original speech time. However, not all subtitles have enough silence time to adjust, and the applicability of this method is limited.

  4. Remove Silence Before and After Dubbing: Usually, some silence is retained before and after dubbing. Removing this silence can effectively shorten the dubbing duration.

  5. Slow Down Video Playback: If simply speeding up the dubbing does not work well, you can consider combining it with slowing down the video playback. For example, the original speech duration for a certain subtitle is 1 second, and it becomes 3 seconds after dubbing. We can shorten the dubbing duration to 2 seconds (speed up by 1x), and at the same time reduce the playback speed of the corresponding video clip to half (extend the duration to 2 seconds), so as to achieve synchronization.

The above methods each have advantages and disadvantages, and cannot perfectly solve all problems. To achieve the best synchronization effect, manual fine-tuning is usually required, but this is contrary to the goal of software automation. Therefore, video translation software usually uses a combination of the above strategies to achieve the best results.

Implementation in Video Translation Software

In the software, these strategies are usually controlled by the following settings:

  • Main Interface Settings:

image.png The "Dubbing Speed" setting is used to globally speed up dubbing;

The "Dubbing Acceleration" setting is used to automatically increase the dubbing duration to match the subtitles;

The "Video Slowdown" setting is used to automatically reduce the video playback speed to match the dubbing duration;

The "Video Extension" setting is used to freeze the last frame of the video until the dubbing ends after the dubbing is finished.

  • Advanced Options Settings (Menu Bar--Tools/Options--Advanced Options--Subtitle Sound Picture Alignment):image.png

    Options such as "Remove trailing silence of dubbing" / "Remove silence length between two subtitles" / "Remove subtitle duration greater than dubbing duration" allow users to more finely control the synchronization of subtitles and dubbing.

    In addition, "Maximum audio acceleration multiple" (default 3x) and "Video slowdown multiple" (default 20x) limit the degree of acceleration and deceleration to prevent voice distortion or video playback that is too slow.

  • Audio Compensation Shift Left: Due to the precision limitations of the underlying technology (ffmpeg), even if synchronization is achieved at the beginning of the video, the dubbing duration may gradually become longer than the subtitle duration over time. The "Audio Compensation Shift Left" setting can shift the entire subtitle timeline to the left, effectively alleviating this problem, for example, eliminating a blank between subtitles every 3 minutes.

By flexibly using the above settings, video translation software can automate the synchronization of subtitles and dubbing as much as possible, improving translation efficiency.