Skip to content

This article briefly introduces the principle, functions, uses, and methods of using the "Video Translation Dubbing Software". The main contents include:

  1. What is this and what is it for
  2. How to download, install, and update
  3. Where to download models
  4. How to choose a translation channel
  5. What is a proxy and is it necessary
  6. How to use it specifically
  7. How to use CUDA acceleration
  8. How to use the original video's voice for dubbing
  9. How to use GPT-SoVIT dubbing
  10. What to do if you encounter problems
  11. Is it charged, are there any restrictions
  12. Will the project die
  13. Can the source code be modified

What is this and what is it for

This is an open-source video translation dubbing tool (open-source protocol GPL-v3) that can translate a video with audio in one language into a video with audio in another language and embed subtitles in that language. For example, if you have an English movie with English audio, without English subtitles, and without Chinese subtitles, you can use this tool to convert it into a movie with Chinese subtitles and Chinese dubbing.

Open-source address https://github.com/jianchang512/pyvideotrans

In addition to this core function, it also comes with some other tools:

  • Speech recognition to text: Can recognize the sound in video or audio as text and export it as a subtitle file.
  • Audio and video separation: Can separate the video into a silent video file and an audio file
  • Text subtitle translation: Can translate text or srt subtitle files into text or subtitles in other languages
  • Video subtitle merging: Can embed subtitle files into the video
  • Audio, video, and subtitle merging: Can combine video files, audio files, and subtitle files into one file
  • Text-to-speech: Can synthesize any text or srt file into an audio file.
  • Vocal and background separation: Can separate human voices and other sounds in the video into 2 audio files
  • Download YouTube videos: Can download YouTube videos online

What is the principle of this tool?

First, use ffmpeg to separate the original video into an audio file and a silent mp4. Then, use the openai-whisper/faster-whisper model to recognize the human voice in the audio and save it as an srt subtitle. Then, translate the srt subtitle into the target language and save it as an srt subtitle file. Finally, synthesize the translation result into a dubbing audio file.

Then, merge the dubbing audio file, subtitle srt file, and the original silent mp4 into a video file to complete the translation.

Of course, the intermediate steps are more complicated, such as extracting background music and vocals, aligning subtitles, sound, and picture, voice cloning, CUDA acceleration, etc.

Can it be deployed from source code?

Yes, and MacOS and Linux systems do not provide pre-packaged versions, so you can only use source code deployment. Please check the repository page for details: https://github.com/jianchang512/pyvideotrans

How to download, install, and update

Download from GitHub This is an open-source project on GitHub, so the preferred download address is naturally GitHub: https://github.com/jianchang512/pyvideotrans/releases. After opening it, select the top one to download.

If you go through the homepage, such as the address https://github.com/jianchang512/pyvideotrans, after opening it, click on the "Releases" text in the middle right of the page to see the download page above.

Updating is very simple. Go to the download page and see if the latest version is newer than the one you are currently using. If so, download it again, and then unzip and overwrite it.

Download and install from the documentation site

Of course, a simpler way is to directly click to download from the documentation site: https://pyvideotrans.com

Double-click sp.exe after decompression to open and use:

Unzip to an English or number directory. It is best not to contain Chinese characters or spaces, otherwise, some strange problems may occur.

The list of files after decompression is as follows

Where to download models

The tiny model is built-in by default. This is the smallest and fastest model, but it is also the least accurate. If you need other models, please download them from this page: https://github.com/jianchang512/stt/releases/tag/0.0

How to choose a translation channel

After recognizing the subtitles, if you need to convert the subtitles to another language, such as if the original is an English video and you want to embed Chinese subtitles after processing, then you need to use a translation channel.

Currently, it supports Microsoft Translate, Google Translate, Baidu Translate, Tencent Translate, DeepL Translate, ChatGPT Translate, AzureGPT Translate, Gemini Pro Translate, DeepLx Translate, OTT Offline Translate, FreeGoogle Translate, FreeChatGPT Translate

FreeChatGPT Translation

This is a free ChatGPT translation interface sponsored by apiskey.top. No sk and no configuration are required. You can use it by selecting it. It is based on the 3.5 model.

FreeGoogle Translate: This is a reverse proxy Google Translate channel, which can be accessed and used without a proxy, but there is a limit to the number of requests. It is recommended for novice users who do not know how to configure a proxy. Other users who want to use Google Translate, please fill in the network proxy address.

DeepL Translate: This translation effect should be the best, even better than chatGPT. Unfortunately, the paid version cannot be purchased in China, and the free version is difficult to call via API. DeepLx is a tool for getting DeepL for free, but local deployment is basically unusable. Since there are many subtitles and multi-threaded translation is used at the same time, it is easy to be blocked and restricted by IP. Consider deploying it on Tencent Cloud to reduce the error rate.

https://juejin.cn/user/4441682704623992/posts

Microsoft Translate: It is completely free without a proxy, but frequent use may still cause IP restriction issues.

Google Translate: If you have a proxy and know what a proxy is and how to fill in the proxy, then it is recommended to choose Google Translate. It is free and the effect is great. Just fill in the proxy address in the text box.

Check out this method, a small tool - use Google Translate directly without a proxy

Tencent Translate: If you don't know anything about proxies, then don't bother. Apply for a free Tencent Translate. Click here to view Tencent Translate Api Application. The first 5 million characters per month are free.

Baidu Translate: You can also apply for the Baidu Translate Api. Click here to view Baidu Translate Api Application. For those who have not completed the certification, there are 50,000 free characters per month, and for those who have completed personal certification, there are 1 million free characters per month.

Using OTT Offline Translation: If you are willing to tinker, you can choose to deploy the free OTT offline translation. The download address is https://github.com/jianchang512/ott. After deployment, fill in the address in the software menu - Settings - OTT Offline Translation.

Using AI Translation ChatGPT / Azure / Gemini:

ChatGPT and AzureGPT must have their paid accounts. Free accounts are not available. After having an account, open the menu - Settings - OpenAI/chatGPT key and fill in your chatGPT sk value. AzureGPT and Gemini are also filled in in the menu - Settings.

Note that if you are using the official ChatGPT api, you do not need to fill in the "API URL". If it is a third-party api, fill in the api address provided by the third party.

ChatGPT Access Guide: Quickly obtain and configure API keys and fill them in for use in software/tools https://juejin.cn/post/7342327642852999168

The official OpenAI ChatGPT and Gemini/AzureGPT must fill in a proxy, otherwise they cannot be accessed.

AzureGPT is also filled in here

Gemini is currently free. After filling in the api key and correctly setting the proxy, it can be used.

What is a proxy and is it necessary

If you want to use Google Translate or use the official ChatGPT API or use Gemini/AzureGPT, then a proxy is required. You need to fill in the proxy address in this format http://127.0.0.1:port number in the proxy address box. Please note that the port number must be an "http type port, not a sock port".

For example, if you are using a certain software, then fill in http://127.0.0.1:10809. If it is a certain software, then fill in http://127.0.0.1:7890. If you are using a proxy but don't know what to fill in, open the lower left or upper right part of the software or look carefully for the http character followed by a 4-5 digit number, and then fill in http://127.0.0.1:port number.

If you don't understand what a proxy is at all, for reasons you know, it is inconvenient to say more, please Baidu it yourself.

Please note: The proxy address does not need to be filled in if it is not used, but do not fill it in randomly, especially do not fill in the api address here.

How to use it specifically

Double-click sp.exe to open the software. The default interface is as follows

The first one selected on the left by default is the simple novice mode, which is convenient for novice users to quickly experience and use. Most options have been set by default.

Of course, you can choose the standard function mode to achieve a high degree of customization and complete the entire process of video translation + dubbing + embedding subtitles. The other buttons on the left are actually the split of this function or other simple auxiliary functions. Let's take the simple novice mode as an example to demonstrate how to use it.

How to use CUDA acceleration

If you have an Nvidia graphics card, you can configure the CUDA environment and then select "CUDA acceleration" to get a great acceleration. How to configure it? There is a lot of content, please check this tutorial

How to use the original video's voice for dubbing

First, you need another open-source project: clone-voice: https://github.com/jianchang512/clone-voice. After installing, deploying, and configuring the model, fill in the address of the project in the software menu - Settings - Original Voice Cloning Api.

Then select "clone-voice" for TTS and "clone" for the dubbing role.

How to use GPT-SoVIT dubbing

Currently, the software supports using GPT-SoVITS for dubbing. After deploying GPT-SoVITS, start the api service, and then fill in the address in the Video Translation Software Settings menu - GPT-SOVITS.

Specifically, you can check these 2 articles:

Call GPT-SoVITS in other software to synthesize text into speech https://juejin.cn/post/7341401110631350324

GPT-SoVITS project API improvement and use https://juejin.cn/post/7343138052973297702

What to do if you encounter problems

First, read the project homepage carefully: https://github.com/jianchang512/pyvideotrans. Most of the problems are explained.

Secondly, you can visit the documentation website: https://pyvideotrans.com

Thirdly, if you still can't solve it, then submit an Issue here: https://github.com/jianchang512/pyvideotrans/issues. Of course, there is also a QQ group on the project homepage: https://github.com/jianchang512/pyvideotrans, you can join the group.

It is recommended to follow my WeChat official account (pyvideotrans), which contains all the original tutorials and common problems of this software, as well as related tips. Due to limited energy, the tutorials of this project are only published on my Juejin blog and WeChat official account. GitHub and the documentation website will not be updated frequently.

Search for the official account in WeChat Search: pyvideotrans

Is it charged, are there any restrictions

The project is open-source under the GPL-v3 protocol, free to use, with no built-in charging items and no restrictions (must comply with Chinese law). You are free to use it. Of course, Tencent Translate, Baidu Translate, DeepL Translate, chatGPT, and AzureGPT are charged, but that has nothing to do with me. They don't give me any money.

Will the project die

There are no projects that will not die. There are only long-lived and short-lived projects. Projects that rely solely on love may die earlier. Of course, if you want it to die slower and live longer, and receive effective continuous maintenance and optimization during its survival, you can consider donating to help it survive a few more days.

Can the source code be modified

The source code is completely open. It can be deployed locally or modified and used by yourself. However, note that the source code open-source protocol is GPL-v3. If you integrate the source code into your project, then your project must also be open-source to avoid violating the open-source protocol.