Three-Step Reflection Method for Translating SRT Subtitles
The tool in this article has been packaged as an exe. After downloading and extracting, double-click
app.exe
to use it. Please continue reading this article for specific usage methods and principles.Download address: https://github.com/jianchang512/ai2srt/releases/download/v0.2/windows-ai2srt-0.2.7z
Andrew Ng's "Reflective Three-Step Translation Method" is very effective. It further improves the quality of translation by allowing the model to self-examine the translation results and propose improvement suggestions. However, directly applying this method to SRT format subtitle translation presents some challenges.
Special Requirements of SRT Subtitle Format
SRT format subtitles have strict format requirements:
- First line: Line number
- Second line: Two timestamps, connected by
-->
, in the formathour:minute:second,3-digit milliseconds
- Third line and after: Subtitle text content
Subtitles are separated by two blank lines.
Example:
1
00:00:01,950 --> 00:00:04,430
Several molecules have been discovered in the five-star system,
2
00:00:04,720 --> 00:00:06,780
We are still multiple universes away from third-type contact.
3
00:00:07,260 --> 00:00:09,880
Weibo has been carrying out filming missions for years now,
4
00:00:10,140 --> 00:00:12,920
Many previously difficult-to-capture photos have been transmitted recently.
Common Problems in SRT Translation
When using AI to translate SRT subtitles, the following problems may occur:
- Format errors:
- Missing line numbers or duplicate timestamps
- Translating English symbols in timestamps into Chinese symbols
- Merging adjacent two lines of subtitle text into one line, especially when the previous and next sentences form a complete sentence grammatically.
- Translation quality issues:
- Even with strict prompt word restrictions, translation errors often occur.
Common Error Examples:
- Subtitle text merging causes blank lines
- Format mess
- Line number translated
- Inconsistent number of original and result subtitles
As mentioned above, when the preceding and following subtitles belong to the same sentence grammatically, they are likely to be translated into the same line, resulting in a missing number of result subtitles.
Format errors directly prevent subsequent processes that rely on srt from proceeding. The errors and error probabilities of different models vary. Relatively speaking, the more intelligent the model, the more likely it is to return legal and compliant content, while locally deployed small-scale models are almost unusable.
However, given the improvement of the three-step reflection method on translation quality, I still tried it. Finally, I chose to use gemini-1.5-flash
for a small try, mainly because it is intelligent enough and free. Except for the frequency limitation, there are almost no restrictions.
Writing Prompt Ideas
Write prompts according to Andrew Ng's three-step reflection workflow
- The first step requires the AI to translate literally.
- The second step requires evaluating the literal translation and providing optimization suggestions.
- The third step re-translates idiomatically based on the optimization suggestions.
The difference is to strengthen the requirement that the returned content must be in a legal SRT format, although it may not be 100% compliant.
Build a Simple API
One problem with the three-step reflection mode is that it consumes much more tokens. The prompt words become longer, and the output results become longer. In addition, due to Gemini's frequency limit, overclocking will return a 429 error, and you need to pause for a period of time after each request.
Use flask to build the backend API, and use bootstrap5 to simply make a single page. The overall interface is as follows
Obviously, you must have a ladder if you want to use Gemini in China.
- Translate lines at the same time: refers to the number of subtitle lines in one translation request. If it is too large, it may exceed the token limit and cause errors. If it is too small, it is meaningless. It is recommended to be within 30-100, default 50.
- Pause seconds after translation: To prevent requests from being too frequent and causing a 429 error, it will pause for 10 seconds after each request is returned, and then initiate the next request.
Return result example
<step1_initial_translation>
1
00:00:01,950 --> 00:00:04,430
Several molecules have been discovered in the five-star system,
2
00:00:04,720 --> 00:00:06,780
We are still multiple universes away from third-type contact.
3
00:00:07,260 --> 00:00:09,880
Weibo has been carrying out filming missions for years now,
4
00:00:10,140 --> 00:00:12,920
Many previously difficult-to-capture photos have been transmitted recently.
5
00:00:13,440 --> 00:00:17,500
In early June, astronomers published this photo in Nature,
6
00:00:18,040 --> 00:00:19,180
Outside the blue core,
7
00:00:19,360 --> 00:00:21,380
There\'s also this circle of orange light,
8
00:00:21,900 --> 00:00:23,740
This is a new drama-scale sweet donut,
9
00:00:24,380 --> 00:00:25,640
This is a portal.
10
00:00:26,280 --> 00:00:28,100
This is the generation ring of an alien civilization,
</step1_initial_translation>
<step2_reflection>
* **Line 1:** "Five-star system" is likely a mistranslation. It probably refers to a five-member committee or group, not a star system. Clarify the context.
* **Line 2:** "Multiple universes" seems like an over-exaggeration. Rephrase for clarity and accuracy.
* **Line 3:** "Weibo" should be explained as a Chinese social media platform. "Filming missions" is unclear. Does it mean "posting videos/images"?
* **Line 8:** "Drama-scale sweet donut" is a nonsensical literal translation. Figure out the intended meaning.
* **Line 9:** "Portal" seems out of context. Verify the intended meaning.
* **Line 10:** "Generation ring" is likely a mistranslation. Clarify the context.
* **Line 11:** "Organic polycyclic aromatic hydrocarbons" is overly technical for a general audience. Simplify if possible.
* **Line 12 and 14:** Use the correct formatting for the galaxy\'s name: SPT0418-47.
* **Line 15:** "It hasn\'t been shortened" is awkward. Remove or rephrase.
* **Line 28:** The name of the organization and the resource should be translated more naturally and accurately. Consider breaking this long line into two for better readability.
* **Line 29:** "Cute plush dolls" may sound childish. Consider rephrasing as "animated characters" or similar.
* **Line 35:** "James Webb Space Telescope" should be used consistently throughout. Shortening to "Webb Telescope" after the first mention is acceptable.
* **Line 44:** "SPD048" is likely a typo. It should be SPT0418-47 to be consistent.
* **Line 45-46:** "Standard beautiful photo" is redundant. Simplify to "beautiful photo".
* **Line 48:** "Grovitational Lenshin" is a typo. Correct to "Gravitational Lensing".
* **Line 50:** The sentence is incomplete. Finish the thought.
</step2_reflection>
<step3_refined_translation>
1
00:00:01,950 --> 00:00:04,430
Several molecules have been discovered in the five-member group\'s area of focus.
2
00:00:04,720 --> 00:00:06,780
We are still far from making contact with extraterrestrial life.
3
00:00:07,260 --> 00:00:09,880
The James Webb Space Telescope has been capturing images for a year now,
4
00:00:10,140 --> 00:00:12,920
and has recently transmitted many previously unseen photos.
5
00:00:13,440 --> 00:00:17,500
In early June, astronomers published this image in Nature.
6
00:00:18,040 --> 00:00:19,180
Outside the blue core,
7
00:00:19,360 --> 00:00:21,380
there\'s a ring of orange light.
8
00:00:21,900 --> 00:00:23,740
This is a large, ring-shaped structure.
9
00:00:24,380 --> 00:00:25,640
This is being investigated.
10
00:00:26,280 --> 00:00:28,100
This is thought to be a sign of an early galaxy.
</step3_refined_translation>
Extract the text within the <step3_refined_translation></step3_refined_translation> tags from the result, which is the translation result.
Simply packaged it, interested can download local try
Download directly, unzip and double-click app.exe
to automatically open the above UI interface in the browser, enter the Key applied for in Gemini, fill in the proxy address, select the srt subtitle file to be translated, select the target language to be translated, and try the result.
Q1: How is the reflection workflow different from traditional machine translation?
A1: The reflection workflow introduces a self-evaluation and optimization mechanism, simulating the thinking process of human translators, and can produce more accurate and natural translation results.
Q2: How long does it take to use the reflection workflow?
A2: Although the reflection workflow requires multiple AI processing, it usually only takes 10-20 seconds more than traditional methods. Considering the improvement in translation quality, this time investment is worthwhile.
Q3: Can the reflection workflow guarantee that the subtitle translation result must be a legal srt?
A3: No, there may still be problems such as blank lines and inconsistent number of original subtitles. For example, if there are two subtitles before and after, and the latter one has only 3-5 words and belongs to the continuous of the above sentence grammatically, then the translation result is likely to be merged into one.
Added a function to the gadget that supports uploading video or audio files at the same time, using Gemini to convert audio and video into subtitles, translating them at the same time, and returning the translation results.
The Gemini large model itself supports both text and audio-visual forms, so it can realize the task of transcribing audio and video into subtitles and translating them with one request.
For example, if an English-speaking video is sent to Gemini and specified to be translated into Chinese, the returned result will be Chinese subtitles.
1. Translate subtitles only
You can paste SRT format subtitles into the text box on the left, or directly click the "Upload SRT subtitles" button to select a subtitle file from your local computer.
Then set the target language you want to translate into, and you can use the "Three-Step Reflection Translation Method" to command Gemini to perform the translation task. The returned results are output to the text box on the right. Click the "Download button" in the lower right corner to save it as an srt file to your local computer.
2. Transcribe audio and video into subtitles
Click the "Upload audio and video to transcribe into subtitles" button on the left, select any audio or video file to upload. After the upload is complete, submit it. After processing, Gemini will return the subtitle content recognized from the voices in the audio and video, which is a good effect.
If a target language is specified at the same time, Gemini will continue to translate the result into the language you specify after recognizing it, and then return it. That is, it completes two tasks: generating subtitles and translating subtitles at the same time.
Download address:
https://github.com/jianchang512/ai2srt/releases/download/v0.2/windows-ai2srt-0.2.7z