Quick Start Guide - pyVideoTrans | pyVideoTrans-Open Source Video Translation Tool -pyvideotrans.com github.com/jianchang512/pyvideotrans

This is a powerful open-source video translation software dedicated to seamlessly converting videos from one language's audio and subtitles to another. Whether you are a content creator, educator, or language learner, pyVideoTrans provides you with a one-stop solution to break down language barriers.

Core Features at a Glance

Fully Automatic Video Translation: Intelligently recognizes speech in videos, generates source language subtitles, translates them into the target language, dubs the audio, and finally synthesizes the new audio and subtitles into the original video, all in one go.
Speech Recognition and Transcription: Accurately transcribes human speech from video or audio files into SRT subtitle files with timestamps in batches.
SRT Subtitle File Translation: Supports batch translation of SRT subtitle files, preserving the original timecodes and formatting, and provides a variety of bilingual subtitle styles.
Text-to-Speech (TTS): Utilizes various advanced TTS channels to generate high-quality, natural-sounding voiceovers for your text or SRT subtitle files.
Practical Toolkit: Built-in auxiliary tools such as video/audio/subtitle merging and vocal/background sound separation to meet your various refined needs in video processing.

How It Works

Before you begin, be sure to understand the core workings of this software:

pyVideoTrans works by recognizing and processing the [human speaking voice] in the video. It is completely independent of whether the video screen already has subtitles (hard subtitles).

Can process: Any video containing human speech, whether it has embedded subtitles or not.
Cannot process: Videos with only background music and hard subtitles, but without any human speech. This software also cannot directly extract hard subtitles from the video screen.

Download and Installation

1.1 Windows Users (Pre-packaged Version)

We provide a ready-to-use pre-packaged version for Windows 10/11 users, eliminating the need for cumbersome configuration.

Click here to download the pre-packaged Windows version, unzip and use

Unzipping Precautions

Incorrectly unzipping is the most common cause of software startup failures. Please strictly adhere to the following rules:

Prohibit Administrator Privileges Paths: Do not unzip to system folders such as C:/Program Files, C:/Windows, or Desktop.
Path Must Be Pure English: The unzipping path cannot contain any Chinese characters, spaces, or special symbols.
Recommended Practice: Create a new folder with pure English or numbers (e.g., D:/videotrans) on a non-system drive like D or E, and then unzip the package into this folder.

Unzipping Path Example

Starting the Software

After unzipping, enter the folder and find the sp.exe file. Double-click to run it. sp.exe

The software needs to load more modules for the first startup, which may take tens of seconds. Please be patient.

1.2 MacOS / Linux Users (Source Code Deployment)

For MacOS and Linux users, deployment needs to be done via source code.

Source Code Repository Address: https://github.com/jianchang512/pyvideotrans
Detailed Deployment Tutorial:
- Detailed Tutorial for MacOS System Source Code Deployment
- Detailed Tutorial for Linux System Source Code Deployment

Software Interface and Core Functions

After the software starts, you will see the following main interface.

Left Function Area: Switch between the main function modules of the software, such as Custom Video Translation, Audio and Video to Subtitles, etc.
Top Menu Bar: Perform global configuration.
- Translation Settings: Configure the API keys and related parameters for each translation channel (e.g., OpenAI, Azure).
- TTS Settings: Configure the API keys and related parameters for each voiceover channel (e.g., OpenAI TTS, Azure TTS).
- Speech Recognition Settings: Configure the API keys and parameters for the speech recognition channel (e.g., OpenAI API, Alibaba ASR).
- Tools/Options: Contains various advanced options and auxiliary tools, such as subtitle format adjustment, video merging, and vocal separation.
- Help/About: View software version information, documentation, and community links.
Right Workspace: The specific operation area for the current function module.

Quick Start - Video Translation Full Process

This is the core function of the software. We will guide you step by step through a complete video translation task. The Custom Video Translation module opens by default.

Step 1: Select Video and Output Settings

Select Video to Process: Click the button to select one or more video files (hold Ctrl to select multiple).
Folder: Check this option to batch process all videos within the entire folder.
Save to..: Set the output directory for the translated video. The default is the _video_out folder in the original video directory.
Clean Generated: Check this option if you need to reprocess the same video (instead of using the cache).
Save Video Only: If checked, only the final MP4 video will be kept after processing, and intermediate files such as subtitles and audio will be automatically deleted.
Move Subtitle Position: If the original video has hard subtitles, checking this option will attempt to place the new subtitles in a different position to avoid overlap.
Shutdown After Completion: Automatically shuts down the computer after processing all tasks, suitable for large-scale, long-term tasks.

Step 2: Configure Translation and Voiceover

Translation Channel: Select the engine used to translate subtitles.
- Free: Google (Free) (requires proxy), Microsoft Translate (no proxy required).
- High Quality (API Key Required): OpenAI, Gemini, DeepL, etc. Set the API Key in the corresponding location in the top menu bar.
Source Language: Must accurately select the language spoken by the people in the original video.
Target Language: The target language you want to translate into.
Glossary: Check this option to use a preset glossary for translation to ensure the accuracy of professional vocabulary.
Network Proxy: If using a channel that requires a proxy (such as Google, OpenAI), fill in your proxy address and port here (e.g., http://127.0.0.1:10808).
Voiceover Channel: Select the engine to generate voiceovers. Edge-TTS is the default option, free and with excellent results.
Voiceover Role: You must select the target language first to load and select the corresponding voice (male/female, etc.).
Listen to Voiceover: Click to preview the sound effect of the current role.
Voiceover Speed/Volume/Pitch: Adjust as needed. The numbers represent the percentage increase or decrease based on the default.

Step 3: Configure Speech Recognition

This is a crucial step in converting video speech into text subtitles, directly affecting the quality of all subsequent processes.

Speech Recognition: It is recommended to use the default faster-whisper(local), which is free, runs locally, and provides excellent results.
Select Model: The larger the model, the more accurate the recognition, but the slower the speed and the more resources consumed.
- Entry-Level: tiny / medium
- Recommended: large-v3-turbo (excellent effect and fast speed, highly recommended with NVIDIA graphics card and CUDA acceleration).
Speech Segmentation Mode: It is recommended to use the default Overall Recognition.
LLM Re-segmentation: If checked, a large language model will be used to intelligently segment and punctuate the recognized text, significantly improving subtitle readability.
Noise Reduction: If checked, the audio will be denoised to improve speech recognition accuracy in noisy environments.

Step 4: Set Synchronization and Subtitles

Since different languages have different speech speeds, the duration of the translated voiceover may not match the original video. This section allows for adjustments.

Synchronization Alignment:
- Voiceover Acceleration: When the voiceover is longer than the video, accelerate the voiceover to match the video duration (commonly used).
- Video Slowdown: When the voiceover is longer than the video, slow down the video to match the voiceover duration.
- Video Extension: When the voiceover is longer than the video, add still frames at the end of the video to match the voiceover duration.
Subtitle Embedding:
- Do Not Embed Subtitles: Only replace the audio, without adding any subtitles.
- Embed Hard Subtitles: Permanently "burn" the subtitles into the picture, which cannot be turned off.
- Embed Soft Subtitles: Package the subtitles as an independent track into the video, which can be turned on or off by the player.
- (Dual): Embed bilingual subtitles with both the source and target languages.

Step 5: Process Background Sound

Keep Original Background Sound: If checked, the software will attempt to separate the human voice and background sound of the original video and keep the background sound in the final video. Note: This feature will significantly increase processing time, but can greatly improve the quality of the finished product.
Add Additional Background Audio: You can also select your own audio file as new background music.
Background Volume: Adjust the volume of the background sound. Less than 1 reduces the volume, greater than 1 increases the volume.

Step 6: Start Execution

CUDA Acceleration: If you have an NVIDIA graphics card and have correctly installed the CUDA environment, be sure to check this option. It can increase the speed of speech recognition by several times or even dozens of times.

After all settings are complete, click the "Start" button.

Executing

The software will start working. If only one video is being processed, it will pause after subtitle generation and translation, giving you the opportunity to proofread and modify the subtitles in the text box on the right. Click execute again to continue after confirming that everything is correct.

Step 7: View Results

After the task is completed, click the progress bar area at the bottom to open the output folder. You will see the final MP4 file and the SRT subtitles, voiceover files, and other materials generated during the process.

Explore Other Practical Features

In addition to the core video translation, pyVideoTrans also provides several independent powerful functions.

4.1 Audio and Video to Subtitles/Voice Transcription/Speech Recognition

Batch transcribe video or audio files into SRT subtitles. Simply drag in the files, set the original language and recognition model, and start. Supports advanced features such as LLM Re-segmentation and Noise Reduction.

4.2 Batch Translate SRT Subtitles

If you already have SRT subtitle files, this function can help you quickly translate them into other languages while keeping the timeline unchanged. It also supports selecting multiple output formats such as Single Language Subtitles, Target Language Above (Dual), and Target Language Below (Dual).

4.3 Batch Dubbing Subtitles

Convert your SRT files or plain text into voiceover files (such as WAV or MP3) in batches through the selected TTS engine. Supports fine-tuning of speech speed, volume, and pitch.

4.4 Audio and Video Subtitle Merging

This is a useful post-processing tool. When you have separate video, voiceover, and subtitle files, you can use it to perfectly merge the three into a final video file, and supports customizing subtitle styles.

Chapter 5: Function Overview and Support List

The power of pyVideoTrans lies in its high scalability and support for multiple services.

Speech Recognition (STT) Support:
- Local Offline: faster-whisper, openai-whisper
- Online API: OpenAI SpeechToText, GoogleSpeech, Alibaba FunASR, Doubao Model, and custom API.
Subtitle Translation Support:
- Microsoft Translate, Google Translate, Baidu Translate, Tencent Translate, DeepL, DeepLX, ByteDance Volcano
- Large Language Model: ChatGPT, AzureAI, Gemini, other OpenAI-compatible AI large models, and local large models
- Offline Translation: OTT

Speech Synthesis (TTS) Support:
- Microsoft Edge TTS, Google TTS, Azure AI TTS, OpenAI TTS, Elevenlabs
- Voice Cloning/Local: GPT-SoVITS, clone-voice, ChatTTS, Fish TTS, CosyVoice, F5-TTS, KokoroTTS
- Custom TTS Server API
Supported Languages:
- Simplified and Traditional Chinese, English, Korean, Japanese, Russian, French, German, Italian, Spanish, Portuguese, Vietnamese, Thai, Arabic, Turkish, Hungarian, Hindi, Ukrainian, Kazakh, Indonesian, Malay, Czech, Polish, Dutch, Swedish, Filipino, Finnish, Persian, etc., and supports automatic detection.

Thank you for choosing pyVideoTrans. I hope this software will be a powerful assistant for you to cross language barriers!

Core Features at a Glance ​

How It Works ​

Download and Installation ​

1.1 Windows Users (Pre-packaged Version) ​

Unzipping Precautions ​

Starting the Software ​

1.2 MacOS / Linux Users (Source Code Deployment) ​

Software Interface and Core Functions ​

Quick Start - Video Translation Full Process ​

Step 1: Select Video and Output Settings ​

Step 2: Configure Translation and Voiceover ​

Step 3: Configure Speech Recognition ​

Step 4: Set Synchronization and Subtitles ​

Step 5: Process Background Sound ​

Step 6: Start Execution ​

Step 7: View Results ​

Explore Other Practical Features ​

4.1 Audio and Video to Subtitles/Voice Transcription/Speech Recognition ​

4.2 Batch Translate SRT Subtitles ​

4.3 Batch Dubbing Subtitles ​

4.4 Audio and Video Subtitle Merging ​

Chapter 5: Function Overview and Support List ​