Beginner's Guide | pyVideoTrans-Open Source Video Translation Tool -pyvideotrans.com github.com/jianchang512/pyvideotrans

This article briefly introduces the principle, functions, uses, and methods of using "Video Translation Dubbing Software." The main contents include:

What is this and what is it used for?
How to download, install, and update
Where to download models
How to choose a translation channel
What is a proxy and is it necessary?
How to use it specifically
How to use CUDA acceleration
How to use the original video's voice for dubbing
How to use GPT-SoVIT dubbing
What to do if you encounter problems
Is there a charge and are there any restrictions?
Will the project die?
Can the source code be modified?

What is this and what is it used for?

This is an open-source video translation and dubbing tool (open-source license GPL-v3). It can translate a video with audio in one language into a video with audio in another language and embed subtitles in that language. For example, if you have an English movie with English audio, no English subtitles, and no Chinese subtitles, you can use this tool to convert it into a movie with Chinese subtitles and Chinese dubbing.
Open source address: https://github.com/jianchang512/pyvideotrans

In addition to this core functionality, it also comes with some other tools:

Speech to Text: Can recognize the sound in a video or audio file as text and export it as a subtitle file.
Audio and Video Separation: Can separate a video into a silent video file and an audio file.
Text and Subtitle Translation: Can translate text or SRT subtitle files into text or subtitles in other languages.
Video Subtitle Merging: Can embed a subtitle file into a video.
Audio, Video, and Subtitle Merging: Can combine a video file, an audio file, and a subtitle file into one file.
Text to Speech: Can synthesize any text or SRT file into an audio file.
Vocal and Background Separation: Can separate the human voice and other sounds in a video into two separate audio files.
Download YouTube Videos: Can download YouTube videos online.

What is the principle behind this tool?

The original video is first separated into an audio file and a silent MP4 using ffmpeg. Then, the openai-whisper/faster-whisper model is used to recognize the human voice in the audio and save it as an SRT subtitle. Next, the SRT subtitle is translated into the target language and saved as an SRT subtitle file. Finally, the translation result is synthesized into a dubbing audio file.

Then, the dubbing audio file, subtitle SRT file, and original silent MP4 are combined into a video file, completing the translation.

Of course, the intermediate steps are more complex, such as extracting background music and vocals, aligning subtitles with sound and visuals, voice cloning, and CUDA acceleration.

Can it be deployed from source code?

Yes, and the MacOS and Linux systems do not provide pre-packaged versions, you can only use the source code to deploy. Please check the repository page for details: https://github.com/jianchang512/pyvideotrans

How to download, install, and update

Download from GitHub

This is an open-source project on GitHub, so the preferred download address is naturally GitHub: https://github.com/jianchang512/pyvideotrans/releases. After opening it, select the download at the top.

If you came through the homepage, such as the address https://github.com/jianchang512/pyvideotrans, after opening it, click on the "Releases" text in the middle right of the page to see the download page above.

Updating is very simple. Go to the download page and see if the latest version is newer than the one you are currently using. If it is, re-download it, then unzip and overwrite.

Download and install from the documentation site

Of course, an easier way is to directly click download on the documentation site: https://pyvideotrans.com

Double-click sp.exe after extracting to open and use:

Extract to a directory with English or numbers. It is best not to contain Chinese or spaces, otherwise, some strange problems may occur.

The list of files after decompression is as follows

Where to download models

The tiny model is built-in by default. This is the smallest and fastest model, but it is also the least accurate. If you need other models, please download them from this page: https://github.com/jianchang512/stt/releases/tag/0.0

How to choose a translation channel

After recognizing the subtitles, if you need to convert them into subtitles in another language, such as if the original is an English video and you want to embed Chinese subtitles after processing, then you need to use a translation channel.

Currently supported: Microsoft Translate, Google Translate, Baidu Translate, Tencent Translate, DeepL Translate, ChatGPT Translate, AzureGPT Translate, Gemini Pro Translate, DeepLx Translate, OTT Offline Translate, FreeGoogle Translate, FreeChatGPT Translate

FreeChatGPT Translation

This is a free ChatGPT translation API interface sponsored by apiskey.top. No SK or configuration is required, just select it to use. It is based on the 3.5 model.

FreeGoogle Translate: This is a reverse proxy for Google Translate. It can be accessed and used without a proxy, but there is a limit on the number of requests. It is recommended for novice users who do not know how to configure a proxy. If other users want to use Google Translate, please fill in the network proxy address.

DeepL Translate: This translation effect should be the best, even better than ChatGPT. Unfortunately, the paid version cannot be purchased in China, and the free version is difficult to call via API. DeepLx is a tool for using DeepL for free, but local deployment is basically unusable. Since there are many subtitles and multi-threaded translation at the same time, it is easy to be blocked and the IP is restricted. Consider deploying it on Tencent Cloud to reduce the error rate.

https://juejin.cn/user/4441682704623992/posts

Microsoft Translate: Completely free without a proxy, but frequent use may still cause IP restrictions.

Google Translate: If you have a proxy and know what a proxy is and how to fill it in, then Google Translate is the recommended first choice. It is free and the effect is excellent. You only need to fill in the proxy address in the text box.

View this method, a small tool - use Google Translate directly without a proxy

Tencent Translate: If you don't know anything about proxies, then don't bother. Apply for free Tencent Translate. Click here to view Tencent Translate Api application. The first 5 million characters per month are free.

Baidu Translate: You can also apply for Baidu Translate Api. Click here to view Baidu Translate Api application. For those who have not completed authentication, 50,000 free characters per month; for those who have completed personal authentication, 1 million free characters per month.

Using OTT Offline Translation: If you are willing to try, you can choose to deploy free OTT offline translation. The download address is https://github.com/jianchang512/ott. After deployment, fill in the address in the software menu - Settings - OTT Offline Translation.

Using AI Translation ChatGPT / Azure / Gemini:

ChatGPT and AzureGPT must have their paid accounts, free accounts are not available. After you have an account, open Menu - Settings - OpenAI/chatGPT key and fill in your ChatGPT SK value. AzureGPT and Gemini also fill in in Menu - Settings.

Note that if you are using the official ChatGPT API, you do not need to fill in the "API URL". If it is a third-party API, fill in the API address provided by the third party.

ChatGPT Access Guide: Quickly obtain and configure API keys and fill them in for use in software/tools https://juejin.cn/post/7342327642852999168

OpenAI's official ChatGPT and Gemini/AzureGPT must fill in a proxy, otherwise they cannot be accessed

AzureGPT is also filled in here.

Gemini is currently free. After filling in the API key and correctly setting the proxy, it can be used.

What is a proxy and is it necessary?

If you want to use Google Translate or use the official ChatGPT API or use Gemini/AzureGPT, then a proxy is necessary. You need to fill in a proxy address in this format http://127.0.0.1:port number in the proxy address box. Please note that the port number must be a "http type port, not a sock port".

For example, if you are using a certain software, then fill in http://127.0.0.1:10809, if it is a certain software, then fill in http://127.0.0.1:7890. If you have used a proxy but do not know what to fill in, open the software in the lower left or upper right corner or other places and carefully look for the http characters followed by a 4-5 digit number, and then fill in http://127.0.0.1:port number.

If you don't understand what a proxy is at all, for reasons you know, it is not convenient to say more, please search on Baidu yourself.

Please note: The proxy address does not need to be filled in if it is not used, but do not fill it in randomly, especially do not fill in the API address here.

How to use it specifically

Double-click sp.exe to open the software. The default interface is as follows

The first option selected on the left by default is the simple novice mode, which is convenient for novice users to quickly experience and use. Most of the options have been set by default.

Of course, you can choose the standard function mode to achieve high customization and complete the entire process of video translation + dubbing + embedding subtitles. The other buttons on the left are actually a breakdown of this function, or other simple auxiliary functions. Let's demonstrate how to use it with the simple novice mode

How to use CUDA acceleration

If you have an Nvidia graphics card, then you can configure the CUDA environment, and then select "CUDA acceleration", which will greatly accelerate the process. How to configure it? There is a lot of content, please check this tutorial

How to use the original video's voice for dubbing

First, you need another open source project: clone-voice: https://github.com/jianchang512/clone-voice. After installing, deploying, and configuring the model, fill in the address of the project in the software's Menu - Settings - Original Voice Cloning Api.

Then select "clone-voice" for TTS, and select "clone" for the dubbing role to use it.

How to use GPT-SoVIT dubbing

The software now supports using GPT-SoVITS for dubbing. After deploying GPT-SoVITS, start the API service, and then fill in the address in the Video Translation Software Settings menu - GPT-SOVITS.

Specifically, you can check these two articles:

Calling GPT-SoVITS in other software to synthesize text into speech https://juejin.cn/post/7341401110631350324
GPT-SoVITS project API improvement and use https://juejin.cn/post/7343138052973297702

What to do if you encounter problems

First, carefully read the project's main page https://github.com/jianchang512/pyvideotrans. Most problems are explained there.

Secondly, you can visit the documentation website https://pyvideotrans.com

Again, if you still can't solve it, then post an Issue here https://github.com/jianchang512/pyvideotrans/issues. Of course, there is also a QQ group on the project's main page https://github.com/jianchang512/pyvideotrans, you can join the group.

It is recommended to follow my WeChat public account (pyvideotrans). It contains original tutorials and FAQs about the software, as well as related tips and tricks. Due to limited energy, the project tutorials are only published on this Juejin blog and the WeChat public account. GitHub and the documentation website are not updated frequently.

Search for the public account pyvideotrans in WeChat Search

Is there a charge and are there any restrictions?

The project is open source under the GPL-v3 license. It is free to use, without any built-in charges or restrictions (must comply with the laws of our country). You can use it freely. Of course, Tencent Translate, Baidu Translate, DeepL Translate, ChatGPT, and AzureGPT charge fees, but that's none of my business. They don't give me any money either.

Will the project die?

There are no projects that will not die, only long-lived and short-lived projects. Projects that rely solely on love may die earlier. Of course, if you want it to die slower and live longer, and get effective continuous maintenance and optimization during its survival, you can consider donating to help it live a few more days.

Can the source code be modified?

The source code is completely open and can be deployed locally or modified for use, but note that the source code open source license is GPL-v3. If you integrate the source code into your project, then your project must also be open source to avoid violating the open source license.

What is this and what is it used for? ​

How to download, install, and update ​

Where to download models ​

How to choose a translation channel ​

What is a proxy and is it necessary? ​

How to use it specifically ​

How to use CUDA acceleration ​

How to use the original video's voice for dubbing ​

How to use GPT-SoVIT dubbing ​

What to do if you encounter problems ​

Is there a charge and are there any restrictions? ​

Will the project die? ​

Can the source code be modified? ​