pyVideoTrans Video Translation Software: An open-source software that translates videos from one language to another in terms of pronunciation and subtitles.
Main Uses
- Video Translation: It can recognize the speech in the original video, automatically generate subtitles, translate them into the target language, generate audio for the translated subtitles, and then merge the dubbed audio and target language subtitles with the original video to create a new video, thereby achieving video translation.
- Speech Recognition and Transcription: Supports batch transcription of audio or video files into SRT subtitles
- SRT Subtitle Translation: Can translate SRT subtitles into other languages while preserving the original format and timestamps
- Text-to-Speech for Subtitles or Text: Generates speech for SRT subtitles or text, supporting multiple speech synthesis channels
In addition, it also has auxiliary functions such as merging audio, video, and subtitles, batch merging video and audio, batch merging video and subtitles, and separating vocals from background music.
How the Software Works
This software translates and processes videos by recognizing the speech sounds in them, regardless of any existing subtitles in the video. As long as there is human speech in the video, it can be processed, whether the video contains subtitles or not.
Please note:
- If the video only has subtitles and no speech, video translation or speech recognition cannot be performed.
- This software cannot directly extract or recognize existing hard subtitles in the video.
1. Installing the Software
- Click here for the pre-packaged version for Windows Choose the GitHub or Baidu Netdisk address to download the complete package (7z compressed file).
- Click here to view the source code deployment tutorial for MacOS
- Click here to view the source code deployment tutorial for Linux
2. Extracting the 7z Compressed Package on Windows
On Windows, both the complete package and the patch package are in .7z compressed format. You can use 7-Zip or other decompression software to extract them. (Recommended: 360 Compression Software, address:
https://yasuo.360.cn
)
Extraction Precautions
- Avoid Permission Issues: Do not extract the software to the desktop or folders like
Program Files
on the C drive, which require administrator privileges. - Avoid Path Errors: Do not include Chinese characters, spaces, or special symbols in the extraction path.
- Avoid Permission Issues: Do not extract the software to the desktop or folders like
Strongly recommended: Create a new folder with an English or numerical name on a non-system drive like D or E, and extract the software into that folder. For example:
D:/videotrans
.
- After extracting, find the
sp.exe
file and double-click it to launch the software.
3. Launching the Software
On Windows, double-click sp.exe
to launch the software (on MacOS and Linux, execute python3 sp.py
). Since the software uses PySide6 to build the interface and has many built-in functional modules, it may take some time to start, please be patient.
After successful startup, the main interface of the software will be displayed:
Software Interface Introduction
Top-left title bar: Displays the software version number.
Bottom-left: Click to open the software documentation site.
Menu bar:
Translation Configuration: Used to set some information used by the translation channels, such as the address and SK of the AI translation channel
TTS Settings: Used to set the voice synthesis channel information, such as OpenAI TTS information, F5-TTS interface information, etc.
Speech Recognition Settings: Used to set the configuration information of each speech recognition method, such as API address, key, etc.
Tools/Advanced Options: Set various custom advanced configurations of the software, as well as other auxiliary tools
Left side buttons:
Custom Video Translation
: Used to perform video translation operations.Recognize Subtitles and Translate
: Used to transcribe SRT subtitles from audio or video and translate the subtitles into other languages.Audio and Video to Subtitles
: Used to batch transcribe audio or video into SRT subtitles (audio and video must contain human speech)Batch Translate SRT Subtitles
: Used to translate SRT subtitle files into other languages while maintaining the original format and timelineBatch Dub Subtitles
: Use text or SRT subtitles to generate speech, supporting multiple speech synthesis channelsMerge Audio, Video, and Subtitles
: Used to merge video files, audio files, and SRT subtitle files into the same video, suitable for scenarios where there are separate dubbing files and SRT subtitle files that you want to embed into the video
Video Translation Operation Steps
The software defaults to opening the Custom Video Translation
module, with the operation area on the right.
1. Select the Original Video to be Translated
Select Video to Process
: Click the button to select one or more video files from your computer (hold down the Ctrl key to select multiple).Folder
: Check this box to select a folder, and the software will batch translate all video files in that folder.Clear Generated
: If you operate on the same video again, the cached data from the last generation will be used by default. If you need to regenerate all files, check this box.Save to..
: Click the button to select the save location for the translated files. By default, they are saved in the_video_out
folder in the same directory as the original video.Save Video Only
: Intermediate files such as subtitle files and audio files will be generated during the translation process. If you only need the final translated video, check this box.
2. Select Translation Channel
This software will first convert the video voice to subtitles, and then translate the subtitles into the target language. The translation channel is used to complete the subtitle translation work.
Translation Channel
: Select the subtitle translation channel.Microsoft Translate
: Free, no VPN required, general translation quality. (Default option)Google
: Better translation quality, VPN required.OpenAI ChatGPT
: Best translation quality, VPN and paid account required. It is recommended to usechatgpt-4o
or a newer model. Other AI providers compatible with OpenAI, such as DeepSeek, can be used.Baidu Translate/Tencent Translate
: Domestic translation channel, no VPN required, medium translation quality.
Source Language
: Select the human speaking language of the original video.Target Language
: Select the target language to be translated into.Network Proxy
: If you use a translation channel that requires a VPN (e.g., Google, OpenAI), fill in the proxy IP and port here.
3. Select Voice Synthesis Channel
The translated subtitle files will use the selected voice synthesis channel to generate audio files
Voice Synthesis Channel
: Select the voice synthesis engine.EdgeTTS
: Based on Microsoft Edge browser's voice reading function, free, no proxy required. (Default option)Local Channel
: Requires additional installation and configuration, can be used locally offline.Third-party Paid API
: Usually has a free trial quota.
Voice Synthesis Role
: Select the voice synthesis role (e.g., male voice, female voice). You need to select the target language first to select the voice synthesis role.Listen to Voice Synthesis
: Listen to the effect of the selected voice synthesis role.Voice Synthesis Speed/Volume/Pitch
: Adjust the speed, volume, and pitch of the voice synthesis. The speed and volume setting values represent the percentage increase or decrease relative to the default value. For example, a speed of 15 means 15% faster than normal speed (1.15x speed); a volume of 90 means 90% higher than normal volume (1.9x volume).
4. Select Speech Recognition Engine
This is the most important step, which is to recognize the speech in the video as text and generate SRT subtitles
Speech Recognition
: Select the speech recognition engine to convert video voice to subtitles. The default selection isfaster-whisper
, which is free and can be run locally.Select Model
: If you usefaster-whisper
oropenai-whisper
, you can choose different models. The larger the model, the higher the accuracy, but the slower the running speed and the more resources consumed. The software only includestiny
andmedium
models by default. Other models need to be downloaded separately. It is recommended to use thelarge-v2
orlarge-v3-turbo
model for the best results (requires Nvidia graphics card and CUDA/cuDNN support).Speech Segmentation Mode
: Select the speech segmentation method. It is recommended to use the defaultOverall Recognition
mode for better results. TheEqual Segmentation
mode will divide the speech into segments of equal duration and is only available when usingfaster-whisper
/openai-whisper
.Chinese Re-punctuation
: Check this option to use Alibaba Cloud's punctuation model to re-punctuate Chinese, improving subtitle quality.Speech Noise Reduction
: Check this option to use Alibaba Cloud's speech noise reduction model to reduce noise in the speech, improving recognition accuracy.
5. Set Synchronization Alignment
Since the speed and length of different languages are different, the duration of the translated voice may not be the same as the original video. This section is used to adjust the synchronization between subtitles, voice, and video.
Extend Video
: If the voice duration exceeds the original video duration, check this option to add a still frame at the end of the video to match the video duration with the voice duration.Accelerate Voice
: If the voice duration exceeds the original video duration, check this option to speed up the voice so that its duration matches the video duration (the maximum acceleration factor is 3x, which can be modified in the menuTools
->Advanced Options
).Slow Down Video
: If the voice duration exceeds the original video duration, check this option to slow down the video playback speed so that its duration matches the voice duration (the maximum slow-down factor is 20x, which can be modified in the menuTools
->Advanced Options
).Subtitle Embedding
: Select the subtitle embedding method.Do Not Embed Subtitles
: Do not embed subtitles in the video.Embed Hard Subtitles
: Permanently embed the subtitles in the video, which can be displayed in any player.Embed Soft Subtitles
: Save the subtitles as a separate file with the video, which requires the player to support it to be displayed.Embed Hard Subtitles (Dual)
: Embed two hard subtitles in the original language and the target language.Embed Soft Subtitles (Dual)
: Embed two soft subtitles in the original language and the target language.
CJK Single-Line Characters
: Set the maximum number of characters per line of subtitles in Chinese, Japanese, and Korean when embedding hard subtitles (default 20).Other Languages
: Set the maximum number of characters per line of subtitles in other languages when embedding hard subtitles (default 60).
6. Process Background Sound
Keep Original Background Sound
: Check this option to keep the original background music in the translated video. Note: This option will significantly increase processing time and system resource consumption, and improve the accuracy of subtitle generation.Add Extra Background Audio
: Click the button to select an audio file as the new background music.Loop Background Sound
: If the duration of the new background music is shorter than the duration of the video, check this option to loop the background music.Background Volume
: Adjust the volume of the background music. A value less than 1 reduces the volume, and a value greater than 1 increases the volume.
7. Start Execution
CUDA Acceleration
: If you have an Nvidia graphics card and have installed CUDA/cuDNN, checking this option can greatly improve the translation speed.
Click the Start Execution
button, and the software will start translating the video.
If you only translate one video, the software will pause after generating subtitles and translating subtitles (e.g., to modify typos).
If you select multiple videos, the translation process will not pause, and the subtitles of all videos will be displayed in the subtitle area on the right, which may seem a bit chaotic, but this will not affect the final translation result.
8. View Translation Results
After the translation is complete, click the progress bar to open the folder where the results are located. The translated video file is in MP4 format, and other files are intermediate material files (e.g., SRT subtitle files, audio files).
Audio and Video to Subtitles Function
This function can batch recognize the speech in audio or video and export it as SRT subtitle files.
Batch Translate SRT Subtitles Function
This function can batch translate SRT subtitles into another language and keep the output as legal SRT format subtitles.
Subtitle Output Format
Single-language subtitles: The translation result only contains the target language subtitles
Target language on top (dual): The translation result contains two subtitles in the original language and the target language, with the target language on top and the original language on the bottom
Target language on bottom (dual): The translation result contains two subtitles in the original language and the target language, with the target language on the bottom and the original language on top
Batch Subtitle-to-Speech Function
This function can synthesize speech files from SRT subtitles, supporting batch operations
Other Functions
See the menu bar---Tools for other functions, which can be used as needed
Speech recognition supports faster-whisper and openai-whisper local offline models, as well as OpenAI SpeechToText API, GoogleSpeech, Alibaba Chinese speech recognition model and Doubao model, and supports custom speech recognition APIs.
Subtitle translation supports Microsoft Translate|Google Translate|Baidu Translate|Tencent Translate|ChatGPT|AzureAI|Gemini|DeepL|DeepLX|ByteDance Volcano|Offline translation OTT|Other AI large models compatible with OpenAI and local large models
Voice synthesis supports Microsoft Edge tts Google tts Azure AI TTS Openai TTS Elevenlabs TTS Custom TTS server api GPT-SoVITS clone-voice ChatTTS-ui Fish TTS CosyVoice F5-TTS KokoroTTS
Supported languages: Simplified and Traditional Chinese, English, Korean, Japanese, Russian, French, German, Italian, Spanish, Portuguese, Vietnamese, Thai, Arabic, Turkish, Hungarian, Hindi, Ukrainian, Kazakh, Indonesian, Malay, Czech, Polish, Dutch, Swedish, Filipino/Other languages with optional automatic detection