Dubbing, Subtitle, and Video Synchronization in Video Translation | pyVideoTrans-Open Source Video Translation Tool -pyvideotrans.com github.com/jianchang512/pyvideotrans

Dubbing, subtitle, and video synchronization in video translation has always been a technical challenge. This is because different languages have vastly different grammatical structures and speaking rates. Translating the same sentence into another language will change the number of characters and the speaking rate, resulting in inconsistent dubbing duration compared to the original speech, leading to a lack of synchronization between subtitles, speech, and video.

Specifically, the characters in the original video finish speaking, but the dubbing is only halfway through; or the next sentence in the original video has already started, but the dubbing is still saying the previous sentence, and other synchronization issues.

Character Count Changes Due to Translation

For example, after translating the following Chinese sentences into English, their length and number of syllables change significantly, and the corresponding speech duration also changes accordingly:

Chinese: 得国最正莫过于明 (Dé guó zuì zhèng mò guò yú Míng)
English: There is no country more upright than the Ming Dynasty
Chinese: 我一生都在研究宇宙 (Wǒ yīshēng dōu zài yánjiū yǔzhòu)
English: I have been studying the universe all my life
Chinese: 北京圆明园四只黑天鹅疑被流浪狗咬死 (Běijīng Yuánmíngyuán sì zhī hēi tiān'é yí bèi liúlàng gǒu yǎo sǐ)
English: Four black swans in Beijing's Yuanmingyuan Garden suspected of being bitten to death by stray dogs

As you can see, after translating Chinese subtitles into English subtitles and dubbing, the dubbing duration usually exceeds the original Chinese speech duration. To solve this problem, the following strategies are usually adopted:

Several Coping Strategies

Increase Dubbing Speed: Theoretically, as long as there is no limit to the upper limit of the speaking rate, it is always possible to match the speech duration with the subtitle duration. For example, if the original speech duration is 1 second and the dubbing duration is 3 seconds, increasing the dubbing speed to 300% can synchronize the two. However, this method makes the speech sound hurried and unnatural, and the overall effect is unsatisfactory.
Simplify the Translation: Reduce the dubbing duration by shortening the translation. For example, translate "我一生都在研究宇宙 (Wǒ yīshēng dōu zài yánjiū yǔzhòu)" into a simpler "Cosmology is my life's work." Although this method has the best effect, it requires modifying the subtitles sentence by sentence, which is very inefficient.
Adjust Silence Between Subtitles: If there is silence time between two subtitles in the original speech, you can reduce or remove some of the silence time to bridge the duration difference. For example, if there is 2 seconds of silence between two subtitles in the original speech, and the translated first subtitle is 1.5 seconds longer than the original subtitle, the silence time can be shortened to 0.5 seconds, so that the dubbing time of the second subtitle is aligned with the original speech time. However, not all subtitles have enough silence time to adjust, and the applicability of this method is limited.
Remove Silence Before and After Dubbing: Usually, some silence is retained before and after dubbing. Removing this silence can effectively shorten the dubbing duration.
Reduce Video Playback Speed: If simply speeding up the dubbing does not work well, consider combining it with slowing down the video playback. For example, the original speech duration of a subtitle is 1 second, and it becomes 3 seconds after dubbing. We can shorten the dubbing duration to 2 seconds (speed up by 1x), and at the same time reduce the playback speed of the corresponding video clip to half (extend the duration to 2 seconds) to achieve synchronization.

The above methods have their own advantages and disadvantages, and cannot perfectly solve all problems. To achieve the best synchronization effect, manual fine-tuning is usually required, which contradicts the goal of software automation. Therefore, video translation software usually comprehensively uses the above strategies to achieve the best effect.

Implementation in Video Translation Software

In the software, these strategies are usually controlled by the following settings:

Main Interface Settings:

The "Dubbing Speed" setting is used to globally accelerate the dubbing;

The "Dubbing Acceleration" setting is used to automatically increase the dubbing duration to match the subtitles;

The "Video Slow Down" setting is used to automatically reduce the video playback speed to match the dubbing duration;

The "Video Extend" setting is used to freeze the last frame of the video after the dubbing is finished until the dubbing ends.

Advanced Options Settings (Menu Bar -- Tools/Options -- Advanced Options -- Subtitle Sound Picture Alignment):
Options such as "Remove Silence at the End of Dubbing" / "Remove Silence Length Between Two Subtitles" / "Remove Subtitle Duration Greater Than Dubbing Duration" allow users to more finely control the synchronization of subtitles and dubbing.
In addition, "Maximum Audio Acceleration Multiple" (default 3x) and "Video Slow Down Multiple" (default 20x) limit the degree of acceleration and deceleration to prevent voice distortion or video playback from being too slow.
Audio Compensation Left Shift: Due to the precision limitations of the underlying technology (ffmpeg), even if synchronization is achieved at the beginning of the video, the dubbing duration may gradually become longer than the subtitle duration over time. The "Audio Compensation Left Shift" setting can move the subtitle timeline to the left as a whole, effectively alleviating this problem, for example, eliminating a gap between subtitles every 3 minutes.

By flexibly using the above settings, video translation software can automate the synchronization problem of subtitles and dubbing as much as possible, improving translation efficiency.

Character Count Changes Due to Translation ​

Several Coping Strategies ​

Implementation in Video Translation Software ​

Character Count Changes Due to Translation

Several Coping Strategies

Implementation in Video Translation Software