Improving the Quality of AI-Translated Subtitles
When using AI to translate SRT subtitles, there are generally two methods.
Method 1: Translate the Entire Subtitle Format, Including Untranslated "Line Numbers" and "Timestamps."
As shown in the example below, send the complete format:
1
00:00:01,950 --> 00:00:04,950
Organic molecules were discovered in the Five Elders star system.
2
00:00:04,950 --> 00:00:07,902
We are multi-dimensional away from third contact.
3
00:00:07,902 --> 00:00:11,958
Microwave has been unfolding the filming mission for the anniversary.
Advantages: Considers context, resulting in better translation quality.
Disadvantages: Besides wasting tokens, it can also lead to subtitle format corruption during translation. The returned translation result may no longer be a valid SRT subtitle format. For example, English symbols ,: might be incorrectly changed to Chinese symbols, or line numbers and time lines might be combined into one line, etc.
Method 2: Send Only the Subtitle Text Content, and Then Replace the Corresponding Text in the Original Subtitles with the Translation Results.
In the following format, only send the subtitle text:
Organic molecules were discovered in the Five Elders star system.
We are multi-dimensional away from third contact.
Microwave has been unfolding the filming mission for the anniversary.
Advantages: Ensures that the translation result is always a valid SRT subtitle format.
Disadvantages: It is also obvious that translating subtitle text line by line cannot take context into account, and the translation quality is greatly reduced.
To solve this problem, the software supports translating multiple lines at once, defaulting to 15 lines of subtitles, which can address the context to a certain extent.
However, a new problem arises: different languages have different grammatical rules and sentence structure orders. It is very likely that the original text is 15 lines, but the translation becomes 14 lines, 13 lines, etc., especially when the previous line and the following line are grammatically the same sentence.
If the 15 lines of original subtitles are no longer 15 lines after translation, this will definitely cause subtitle chaos. To solve this problem, when the number of lines in the translation result does not match the number of lines in the original subtitles, it will be re-translated line by line to ensure that the number of subtitle lines before and after are completely consistent, abandoning the consideration of context.
The software defaults to using the second method, after all, being usable is more important than being good.
Starting from version v2.52, support for the first translation method has been added. It is not enabled by default. If you want to enable it, you need to manually turn it on. After enabling it, when using these AIs for translation like ChatGPT/Gemini/AzureGPT/302.AI/ByteDance Volcano/LocalLLM, the complete formatted SRT subtitles will be sent for translation, which can better take into account the context and improve the translation quality.
However, it must be noted that the problems mentioned in the first method may occur, resulting in the result not being a valid SRT subtitle, which may cause parsing errors or loss of all content after the error. It is recommended to use this method only on sufficiently intelligent models, such as GPT-4o-mini or larger models. If it is a locally deployed model, it is not recommended to use this method. Limited by hardware resources, locally deployed models are generally small in scale and not intelligent enough, making it easier for the format of the translation results to be disordered.
Enabling the First Translation Method:
Menu -- Tools/Options -- Advanced Options -- Subtitle Translation Area -- Send Complete Subtitles During AI Intelligent Translation
Adding a Glossary
You can add your own glossary to each prompt, similar to the following:
**During the translation process, be sure to use** the glossary I provide to translate the terms and maintain the consistency of the terms. The specific glossary is as follows:
* Transformer -> Transformer
* Token -> Token
* LLM/Large Language Model -> Large Language Model
* Generative AI -> Generative AI
* One Health -> One Health
* Radiomics -> Radiomics
* OHHLEP -> OHHLEP
* STEM -> STEM
* SHAPE -> SHAPE
* Single-cell transcriptomics -> Single-cell transcriptomics
* Spatial transcriptomics -> Spatial transcriptomics