The dubbing, subtitles, and visual synchronization alignment in video translation have always been technical challenges. This is because different languages have vastly different grammatical structures and speaking speeds. When the same sentence is translated into other languages, the number of characters and speaking speed will change, resulting in the translated dubbing duration being inconsistent with the original speech duration, which causes subtitles and speech to be out of sync with the visuals.
Specifically, the character in the original video has finished speaking, but the dubbing is only halfway through; or the next sentence in the original video has already started, but the dubbing is still saying the previous sentence, etc.
Changes in the Number of Characters Caused by Translation
For example, when the following Chinese sentences are translated into English, their length and number of syllables change significantly, and the corresponding speech duration also changes accordingly:
Chinese: 得国最正莫过于明 (Dé guó zuì zhèng mò guò yú Míng)
English: There is no country more upright than the Ming Dynasty
Chinese: 我一生都在研究宇宙 (Wǒ yīshēng dōu zài yánjiū yǔzhòu)
English: I have been studying the universe all my life
Chinese: 北京圆明园四只黑天鹅疑被流浪狗咬死 (Běijīng Yuánmíngyuán sì zhī hēi tiān'é yí bèi liúlàng gǒu yǎo sǐ)
English: Four black swans in Beijing's Yuanmingyuan Garden suspected of being bitten to death by stray dogs
It can be seen that after translating Chinese subtitles into English subtitles and dubbing, the dubbing duration usually exceeds the original Chinese speech duration. To solve this problem, the following strategies are usually adopted:
Several Coping Strategies
Increase Dubbing Speech Speed: Theoretically, as long as the upper limit of the speech speed is not limited, it is always possible to match the speech duration with the subtitle duration. For example, if the original speech duration is 1 second and the dubbing duration is 3 seconds, increasing the dubbing speech speed to 300% can synchronize the two. However, this method will make the speech sound rushed and unnatural, and sometimes fast and sometimes slow, resulting in a less than satisfactory overall effect.
Simplify Translation: Reduce dubbing duration by shortening the translation. For example, translate "我一生都在研究宇宙 (Wǒ yīshēng dōu zài yánjiū yǔzhòu)" into a simpler "Cosmology is my life's work." Although this method has the best effect, it requires modifying subtitles sentence by sentence, which is very inefficient.
Adjust Silence Between Subtitles: If there is silence time between two subtitles in the original speech, you can reduce or remove some of the silence time to bridge the duration difference. For example, if there is 2 seconds of silence between two subtitles in the original speech, and the translated first subtitle is 1.5 seconds longer than the original subtitle, then the silence time can be shortened to 0.5 seconds, so that the dubbing time of the second subtitle is aligned with the original speech time. However, not all subtitles have enough silence time to adjust, and the applicability of this method is limited.
Remove Silence Before and After Dubbing: Usually, some silence is reserved before and after dubbing. Removing this silence can effectively shorten the dubbing duration.
Slow Down Video Playback: If simply speeding up the dubbing does not work well, you can consider combining it with slowing down the video playback. For example, the original speech duration of a subtitle is 1 second, and it becomes 3 seconds after dubbing. We can shorten the dubbing duration to 2 seconds (speed up by 1 time), and at the same time reduce the playback speed of the corresponding video clip to half (duration extended to 2 seconds), so as to achieve synchronization.
The above methods have their own advantages and disadvantages and cannot perfectly solve all problems. To achieve the best synchronization effect, manual fine-tuning is usually required, but this is contrary to the goal of software automation. Therefore, video translation software usually uses a combination of the above strategies to strive for the best results.
Implementation in Video Translation Software
In the software, these strategies are usually controlled through the following settings:
- Main Interface Settings:
The "Dubbing Speech Speed" setting is used to speed up the dubbing as a whole;
The "Dubbing Speed Up" setting is used to automatically increase the dubbing duration to match the subtitles;
The "Video Slow Down" setting is used to automatically reduce the video playback speed to match the dubbing duration;
The "Video Extend" setting is used to freeze the last frame of the image until the dubbing ends after the dubbing is completed.
- Advanced Options Settings (Menu Bar--Tools/Options--Advanced Options--Subtitle Sound Picture Alignment):
Options such as "Remove Dubbing End Blank" / "Remove Silence Length Between Two Subtitles" / "Remove Subtitle Duration Greater Than Dubbing Duration" allow users to more precisely control the synchronization of subtitles and dubbing.
In addition, the "Maximum Audio Acceleration Multiple" (default is 3 times) and the "Video Slow Speed Multiple" (default is 20 times) limit the degree of acceleration and deceleration to prevent speech distortion or video playback from being too slow.
- Audio Compensation Move Left: Due to the precision limitations of the underlying technology (ffmpeg), even if synchronization is achieved at the beginning of the video, the dubbing duration may gradually become longer than the subtitle duration over time. The "Audio Compensation Move Left" setting can move the subtitle timeline to the left as a whole, effectively alleviating this problem, such as eliminating a blank between subtitles every 3 minutes.
By flexibly using the above settings, video translation software can automate the synchronization problem of subtitles and dubbing as much as possible, and improve translation efficiency.