Skip to content

This article briefly introduces the principles, functions, uses, and usage methods of "Video Translation and Dubbing Software." The main contents include:

  1. What is this and what is it used for?
  2. How to download, install, and update
  3. Where to download models
  4. How to choose a translation channel
  5. What is a proxy and is it necessary?
  6. How to use it specifically
  7. How to use CUDA acceleration
  8. How to use the original video's voice tone for dubbing
  9. How to use GPT-SoVITS for dubbing
  10. What to do if you encounter problems
  11. Is it free and are there any limitations?
  12. Will the project die?
  13. Can the source code be modified?

What is this and what is it used for?

This is an open-source video translation and dubbing tool (licensed under GPL-v3) that can translate a video from one language's audio to another language's audio and embed subtitles in that language. For example, if there is an English movie with English audio, no English subtitles, and no Chinese subtitles, after processing with this tool, it can be converted into a movie with Chinese subtitles and Chinese dubbing.

Open-source address: https://github.com/jianchang512/pyvideotrans

In addition to this core function, it also includes other tools:

  • Speech-to-text: Can recognize speech in videos or audio and export it as subtitle files.
  • Audio-video separation: Can separate a video into a silent video file and an audio file.
  • Text subtitle translation: Can translate text or SRT subtitle files into other languages.
  • Video subtitle merging: Can embed subtitle files into videos.
  • Audio-video-subtitle merging: Can combine video files, audio files, and subtitle files into one file.
  • Text-to-speech: Can synthesize text or SRT files into an audio file.
  • Voice-background separation: Can separate human voices from other sounds in a video into two audio files.
  • Download YouTube videos: Can download YouTube videos online.

How does this tool work?

First, the original video is separated into an audio file and a silent MP4 using FFmpeg. Then, the OpenAI-Whisper/Faster-Whisper model is used to recognize the speech in the audio and save it as an SRT subtitle file. Next, the SRT subtitles are translated into the target language and saved as an SRT subtitle file. Finally, the translated text is synthesized into a dubbed audio file.

The dubbed audio file, subtitle SRT file, and the original silent MP4 are then merged into one video file, completing the translation.

Of course, the intermediate steps are more complex, such as separating background music and voices, aligning subtitles with audio and video, voice cloning, CUDA acceleration, etc.

Can it be deployed from source?

Yes, and for macOS and Linux systems, pre-packaged versions are not provided; you must deploy from source. For details, please check the repository page: https://github.com/jianchang512/pyvideotrans

How to download, install, and update

Download from GitHub

This is an open-source project on GitHub, so the preferred download location is GitHub: https://github.com/jianchang512/pyvideotrans/releases. Open the page and select the top download option.

If you go through the main page, such as https://github.com/jianchang512/pyvideotrans, click on the "Releases" section in the middle right of the page to see the download page above.

Updating is simple: go to the download page again, check if the latest version is newer than the one you are currently using. If so, re-download and extract it to overwrite the old files.

Download and install from the documentation site

An even simpler way is to go directly to the documentation site and click to download: https://pyvideotrans.com

After extraction, double-click sp.exe to open and use:

Extract to a directory with English or numeric names, preferably without Chinese characters or spaces, to avoid strange issues.

The file list after extraction is as follows:

Where to download models

The tiny model is built-in by default. It is the smallest and fastest model but also the least accurate. If you need other models, please download them from this page: https://github.com/jianchang512/stt/releases/tag/0.0

How to choose a translation channel

After recognizing the subtitles, if you need to convert them into subtitles in another language—for example, if the original video is in English and you want to embed Chinese subtitles—you will need to use a translation channel.

Currently supported channels include: Microsoft Translator, Google Translate, Baidu Translate, Tencent Translate, DeepL Translate, ChatGPT Translate, AzureGPT Translate, Gemini Pro Translate, DeepLx Translate, OTT Offline Translate, FreeGoogle Translate, FreeChatGPT Translate.

FreeChatGPT Translate

This is a free ChatGPT translation interface sponsored by apiskey.top. No SK or configuration is required; just select it to use. It is based on the 3.5 model.

FreeGoogle Translate: This is a reverse proxy for Google Translate. It can be used without a proxy but has request limits. Recommended for beginners who cannot configure a proxy. Other users who want to use Google Translate should fill in the proxy address.

DeepL Translate: This translation effect is probably the best, even better than ChatGPT. Unfortunately, the paid version is not available in China, and the free version is difficult to use via API. DeepLx is a tool to use DeepL for free, but local deployment is almost unusable. Due to the large number of subtitles and simultaneous multi-threaded translation, IP blocking is common. Consider deploying on Tencent Cloud to reduce errors.

https://juejin.cn/user/4441682704623992/posts

Microsoft Translator: Completely free and no proxy is required, but frequent use may still lead to IP restrictions.

Google Translate: If you have a proxy and know how to fill in the proxy address, Google Translate is the recommended first choice. It's free, effective, and reliable. Just fill in the proxy address in the text box.

Check this method, a small tool—use Google Translate directly without a proxy

Tencent Translate: If you know nothing about proxies, don't bother. Apply for free Tencent Translate. Click here to view Tencent Translate API Application. The first 5 million characters per month are free.

Baidu Translate: You can also apply for Baidu Translate API. Click here to view Baidu Translate API Application. Without certification, 50,000 free characters per month; with personal certification, 1 million free characters per month.

Using OTT Offline Translate: If you are willing to tinker, you can choose to deploy the free OTT Offline Translate. Download address: https://github.com/jianchang512/ott. After deployment, fill in the address in the software menu → Settings → OTT Offline Translate.

Using AI Translation ChatGPT / Azure / Gemini:

ChatGPT and AzureGPT require paid accounts; free accounts are not usable. After obtaining an account, go to Menu → Settings → OpenAI/ChatGPT Key and fill in your ChatGPT SK value. For AzureGPT and Gemini, fill in the settings in the menu as well.

Note: If you are using the official ChatGPT API, you do not need to fill in the "API URL." If using a third-party API, fill in the API address provided by the third party.

ChatGPT Access Guide: Quickly obtain and configure API keys and fill them into the software/tool for use: https://juejin.cn/post/7342327642852999168

OpenAI's official ChatGPT, Gemini, and AzureGPT all require a proxy to be filled in; otherwise, they cannot be accessed.

AzureGPT is filled in here as well:

Gemini is currently free. Fill in the API key and set the proxy correctly to use it.

What is a proxy and is it necessary?

If you want to use Google Translate, ChatGPT official API, or Gemini/AzureGPT, then a proxy is necessary. You need to fill in the proxy address in the format http://127.0.0.1:port number in the proxy address box. Please note that the port number must be an "HTTP-type port, not a SOCKS port."

For example, if you are using a certain software, fill in http://127.0.0.1:10809. If using another software, fill in http://127.0.0.1:7890. If you are using a proxy but don't know what to fill in, look carefully in the lower left, upper right, or other parts of the software for the HTTP字样 followed by a 4-5 digit number, and then fill in http://127.0.0.1:port number.

If you have no idea what a proxy is, due to known reasons, it's不便多说. Please search on Baidu yourself.

Note: If you don't need the proxy address, you don't have to fill it in, but do not fill it in randomly, especially do not fill in an API address here.

How to use it specifically

Double-click sp.exe to open the software. The default interface is as follows:

By default, the first option on the left is the simple beginner mode, which is convenient for new users to quickly experience and use. Most options are set by default.

Of course, you can choose the standard function mode for highly customizable, complete video translation + dubbing + subtitle embedding. The other buttons on the left are actually splits of this function or other simple auxiliary functions. Let's demonstrate how to use it with the simple beginner mode.

How to use CUDA acceleration

If you have an NVIDIA graphics card, you can configure the CUDA environment and then select "CUDA acceleration" to get a significant speed boost. How to configure it? The content is extensive, please check this tutorial.

How to use the original video's voice tone for dubbing

First, you need another open-source project: clone-voice: https://github.com/jianchang512/clone-voice. After installing, deploying, and configuring the model, fill in the project's address in the software menu → Settings → Original Voice Clone API.

Then, select "clone-voice" for TTS and choose "clone" for the dubbing role to use it.

How to use GPT-SoVITS for dubbing

The software now supports using GPT-SoVITS for dubbing. After deploying GPT-SoVITS, start the API service, and then fill in the address in the video translation software settings menu → GPT-SOVITS.

For details, check these two articles:

Calling GPT-SoVITS in other software to synthesize speech from text: https://juejin.cn/post/7341401110631350324

GPT-SoVITS project API improvements and usage: https://juejin.cn/post/7343138052973297702

What to do if you encounter problems

First, carefully read the project main page: https://github.com/jianchang512/pyvideotrans. Most issues are explained there.

Second, visit the documentation website: https://pyvideotrans.com.

Third, if it still cannot be resolved, submit an Issue here: https://github.com/jianchang512/pyvideotrans/issues. Of course, the project main page https://github.com/jianchang512/pyvideotrans also has a QQ group that you can join.

It is recommended to follow my WeChat public account (pyvideotrans), where I post original tutorials, common issues, and related tips for this software. Due to limited energy, tutorials for this project are only posted on this Juejin blog and the WeChat public account. GitHub and the documentation site are not frequently updated.

Search for the public account in WeChat: pyvideotrans

Is it free and are there any limitations?

The project is open-source under the GPL-v3 license, free to use, with no built-in paid features or any limitations (must comply with Chinese laws). You can use it freely. Of course, Tencent Translate, Baidu Translate, DeepL Translate, ChatGPT, AzureGPT, etc., are paid services, but that has nothing to do with me, and they don't share profits with me.

Will the project die?

There are no projects that never die, only long-lived and short-lived ones. Projects relying solely on passion may die earlier. Of course, if you hope it dies slower and lives longer, and receives effective continuous maintenance and optimization during its lifetime, consider donating to help extend its life a few more days.

Can the source code be modified?

The source code is completely open, can be deployed locally, and can be modified for personal use. However, note that the source code is licensed under GPL-v3. If you integrate this source code into your project, your project must also be open-source to not violate the open-source license.