Skip to content

As you may know, Microsoft Edge browser has a powerful "Read Aloud" feature. It supports dozens of languages, each offering a selection of different voices for pronunciation, delivering excellent results.

Based on this, a developer created a Python package called edge-tts. This package allows you to use Microsoft's TTS service in your programs, providing voiceovers for text or subtitles. For example, the video translation software pyVideoTrans integrates edge-tts, allowing users to directly select it as a voiceover option.

Unfortunately, the misuse of Microsoft TTS by users in China is quite prevalent, with some even using it for commercial voiceover sales. This has led Microsoft to restrict access from China. Frequent use may result in 403 errors, requiring an IP change or a stable connection to a foreign VPN to continue.

So, is it possible to set up a simple proxy service on a foreign server for personal use? This could improve stability and make the API compatible with OpenAI TTS, allowing direct use within the OpenAI SDK.

The answer is yes! I recently created a Docker image that makes it easy to pull and start this service on a server.

Once started, the service interface is fully compatible with OpenAI. Simply change the API address to http://your_server_ip:7899/v1 to seamlessly replace OpenAI TTS. Furthermore, it can be used directly within video translation software.

Here's a detailed guide on how to deploy and use it:

Step 1: Purchase and activate a US server

Step 2: Open port 7899 in the firewall

Step 3: Connect to the server via terminal

Step 4: Install Docker

Step 5: Pull the edge-tts-api image and start the API service

If you already have a server with Docker installed, you can skip to Step 5 to pull the image.

Step 1: Purchase and Activate a US Server

It's recommended to choose a server in the United States due to fewer or no restrictions. You can choose a Linux-based operating system; the following example uses Debian 12 and my personal provider, Yecaoyun. The reason for choosing it is simple: it's cheap and relatively stable, sufficient for a voiceover proxy.

If you already have a Linux server in Europe or America, you can skip this section and go directly to the next section. If not, please continue reading.

Open this link to the Yecaoyun website and select Product Services -> US AMD VPS in the top navigation bar.

image.png

Then, choose any of the top four configurations; they should all be sufficient.

image.png

I personally use the configuration priced at 29 yuan/month.

Click the "Buy Now" button to enter the configuration page. Here, select Debian 12 as the server operating system, set the server password, and leave the rest as default.

image.png

After completing the payment, wait a few minutes for the server to be created and started successfully. Next, you need to configure the firewall and open port 7899. Only by opening this port will you be able to connect to the service for voiceovers.

Step 2: Open Port 7899 in the Firewall

If you plan to use a domain name and configure Nginx reverse proxy, you don't need to open the port. If you're not familiar with these, for simplicity, it's recommended to directly open the port.

The firewall settings interface varies depending on the server and panel. The following example uses the Yecaoyun panel I use; other panels can be used as a reference. If you know how to open a port, you can skip this section and go directly to the next section.

First, in "My Products & Services", click on the product you just activated to enter the product information and management page.

image.png

image.png

On this page, you can find the server's IP address and password.

image.png

Find "Firewall" under "Additional Tools" and click to open it.

image.png

Then open port 7899 as shown below:

image.png

Step 3: Connect to the Server via Terminal

If you already know how to connect to the terminal, or have Xshell or other SSH terminal, you can skip this step and go directly to the next section.

On the product information page, find Xterm.js Console and click it. Then follow the steps shown below:

image.png

image.png

When the above image appears, press Enter a few times.

When Login: is displayed, enter root after it, and then press Enter. image.png

Next, Password: will appear. At this point, you need to paste the password you copied (if you forgot it, you can find it on the product information page).

Note: Do not use Ctrl+V or right-click to paste, as this may cause extra spaces or line breaks to be entered, resulting in a password error.

image.png

Hold down the Shift key + Insert key to paste the password, to prevent the password from being correct but unable to log in, then press Enter.

image.png

The login is successful as shown below.

image.png

Step 4: Install Docker

If your server already has Docker installed or you know how to install it, you can skip this step.

Execute the following 5 commands in sequence. Make sure each command is executed successfully before executing the next command. These commands only apply to the Debian 12 series of servers.

After [root@xxxxxx~]#, right-click to paste the following command, and press Enter to execute after pasting.

image.png

Command 1: sudo apt update && sudo apt install -y apt-transport-https ca-certificates curl gnupg

Command 2: curl -fsSL https://download.docker.com/linux/debian/gpg | sudo gpg --dearmor -o /usr/share/keyrings/docker-archive-keyring.gpg

Command 3: echo "deb [arch=$(dpkg --print-architecture) signed-by=/usr/share/keyrings/docker-archive-keyring.gpg] https://download.docker.com/linux/debian $(lsb_release -cs) stable" | sudo tee /etc/apt/sources.list.d/docker.list > /dev/null

Command 4: sudo apt update && sudo apt install -y docker-ce docker-ce-cli containerd.io docker-buildx-plugin

Command 5: Start the Docker service. sudo systemctl start docker && sudo systemctl enable docker && sudo usermod -aG docker $USER

image.png

This command can be pasted by right-clicking, and press Enter after pasting.

Step 5: Pull the edge-tts-api Image and Start the API Service

Enter the following command to automatically pull the image and start the service. Once started, you can use it in video translation software or other tools that support OpenAI TTS.

docker run -p 7899:7899 jianchang512/edge-tts-api:latest

image.png

Press Ctrl+C continuously to stop the service.

Note that this command will run in the foreground. If you close the terminal window, the service will stop.

You can use the following command instead to start the service in the background, and you can safely close the terminal after execution.

docker run -d -p 7899:7899 jianchang512/edge-tts-api:latest

image.png

If there are no errors, it means the startup was successful. You can open http://your_ip:7899/v1/audio/speech in your browser to verify. If a result similar to the one below appears, it means the startup was successful.

image.png

Using in Video Translation Software

Please update the software to v3.40 to use. Download the update from https://pyvideotrans.com/downpackage

Open the menu and go to TTS Settings -> OpenAI TTS. Change the interface address to http://your_ip:7899/v1

You can fill in SK arbitrarily, as long as it's not empty. In the role list, separate the roles you want to use with commas.

image.png

Available Voices/Roles

Below is a list of available roles. Please note that the text language and role must match.

image.png

Chinese voices:
    zh-HK-HiuGaaiNeural
    zh-HK-HiuMaanNeural
    zh-HK-WanLungNeural
    zh-CN-XiaoxiaoNeural
    zh-CN-XiaoyiNeural
    zh-CN-YunjianNeural
    zh-CN-YunxiNeural
    zh-CN-YunxiaNeural
    zh-CN-YunyangNeural
    zh-CN-liaoning-XiaobeiNeural
    zh-TW-HsiaoChenNeural
    zh-TW-YunJheNeural
    zh-TW-HsiaoYuNeural
    zh-CN-shaanxi-XiaoniNeural

English voices:
    en-AU-NatashaNeural
    en-AU-WilliamNeural
    en-CA-ClaraNeural
    en-CA-LiamNeural
    en-HK-SamNeural
    en-HK-YanNeural
    en-IN-NeerjaExpressiveNeural
    en-IN-NeerjaNeural
    en-IN-PrabhatNeural
    en-IE-ConnorNeural
    en-IE-EmilyNeural
    en-KE-AsiliaNeural
    en-KE-ChilembaNeural
    en-NZ-MitchellNeural
    en-NZ-MollyNeural
    en-NG-AbeoNeural
    en-NG-EzinneNeural
    en-PH-JamesNeural
    en-PH-RosaNeural
    en-SG-LunaNeural
    en-SG-WayneNeural
    en-ZA-LeahNeural
    en-ZA-LukeNeural
    en-TZ-ElimuNeural
    en-TZ-ImaniNeural
    en-GB-LibbyNeural
    en-GB-MaisieNeural
    en-GB-RyanNeural
    en-GB-SoniaNeural
    en-GB-ThomasNeural
    en-US-AvaMultilingualNeural
    en-US-AndrewMultilingualNeural
    en-US-EmmaMultilingualNeural
    en-US-BrianMultilingualNeural
    en-US-AvaNeural
    en-US-AndrewNeural
    en-US-EmmaNeural
    en-US-BrianNeural
    en-US-AnaNeural
    en-US-AriaNeural
    en-US-ChristopherNeural
    en-US-EricNeural
    en-US-GuyNeural
    en-US-JennyNeural
    en-US-MichelleNeural
    en-US-RogerNeural
    en-US-SteffanNeural

Japanese voices:
    ja-JP-KeitaNeural
    ja-JP-NanamiNeural

Korean voices:
    ko-KR-HyunsuNeural
    ko-KR-InJoonNeural
    ko-KR-SunHiNeural

French voices:
    fr-BE-CharlineNeural
    fr-BE-GerardNeural
    fr-CA-ThierryNeural
    fr-CA-AntoineNeural
    fr-CA-JeanNeural
    fr-CA-SylvieNeural
    fr-FR-VivienneMultilingualNeural
    fr-FR-RemyMultilingualNeural
    fr-FR-DeniseNeural
    fr-FR-EloiseNeural
    fr-FR-HenriNeural
    fr-CH-ArianeNeural
    fr-CH-FabriceNeural

German voices:
    de-AT-IngridNeural
    de-AT-JonasNeural
    de-DE-SeraphinaMultilingualNeural
    de-DE-FlorianMultilingualNeural
    de-DE-AmalaNeural
    de-DE-ConradNeural
    de-DE-KatjaNeural
    de-DE-KillianNeural
    de-CH-JanNeural
    de-CH-LeniNeural

Spanish voices:
    es-AR-ElenaNeural
    es-AR-TomasNeural
    es-BO-MarceloNeural
    es-BO-SofiaNeural
    es-CL-CatalinaNeural
    es-CL-LorenzoNeural
    es-ES-XimenaNeural
    es-CO-GonzaloNeural
    es-CO-SalomeNeural
    es-CR-JuanNeural
    es-CR-MariaNeural
    es-CU-BelkysNeural
    es-CU-ManuelNeural
    es-DO-EmilioNeural
    es-DO-RamonaNeural
    es-EC-AndreaNeural
    es-EC-LuisNeural
    es-SV-LorenaNeural
    es-SV-RodrigoNeural
    es-GQ-JavierNeural
    es-GQ-TeresaNeural
    es-GT-AndresNeural
    es-GT-MartaNeural
    es-HN-CarlosNeural
    es-HN-KarlaNeural
    es-MX-DaliaNeural
    es-MX-JorgeNeural
    es-NI-FedericoNeural
    es-NI-YolandaNeural
    es-PA-MargaritaNeural
    es-PA-RobertoNeural
    es-PY-MarioNeural
    es-PY-TaniaNeural
    es-PE-AlexNeural
    es-PE-CamilaNeural
    es-PR-KarinaNeural
    es-PR-VictorNeural
    es-ES-AlvaroNeural
    es-ES-ElviraNeural
    es-US-AlonsoNeural
    es-US-PalomaNeural
    es-UY-MateoNeural
    es-UY-ValentinaNeural
    es-VE-PaolaNeural
    es-VE-SebastianNeural

Arabic voices:
    ar-DZ-AminaNeural
    ar-DZ-IsmaelNeural
    ar-BH-AliNeural
    ar-BH-LailaNeural
    ar-EG-SalmaNeural
    ar-EG-ShakirNeural
    ar-IQ-BasselNeural
    ar-IQ-RanaNeural
    ar-JO-SanaNeural
    ar-JO-TaimNeural
    ar-KW-FahedNeural
    ar-KW-NouraNeural
    ar-LB-LaylaNeural
    ar-LB-RamiNeural
    ar-LY-ImanNeural
    ar-LY-OmarNeural
    ar-MA-JamalNeural
    ar-MA-MounaNeural
    ar-OM-AbdullahNeural
    ar-OM-AyshaNeural
    ar-QA-AmalNeural
    ar-QA-MoazNeural
    ar-SA-HamedNeural
    ar-SA-ZariyahNeural
    ar-SY-AmanyNeural
    ar-SY-LaithNeural
    ar-TN-HediNeural
    ar-TN-ReemNeural
    ar-AE-FatimaNeural
    ar-AE-HamdanNeural
    ar-YE-MaryamNeural
    ar-YE-SalehNeural
 
 
Bengali voices:
    bn-BD-NabanitaNeural
    bn-BD-PradeepNeural
    bn-IN-BashkarNeural
    bn-IN-TanishaaNeural

Czech voices
    cs-CZ-AntoninNeural
    cs-CZ-VlastaNeural

Dutch voices:
    nl-BE-ArnaudNeural
    nl-BE-DenaNeural
    nl-NL-ColetteNeural
    nl-NL-FennaNeural
    nl-NL-MaartenNeural

Hebrew voices:
    he-IL-AvriNeural
    he-IL-HilaNeural

Hindi voices:
    hi-IN-MadhurNeural
    hi-IN-SwaraNeural

Hungarian voices:
    hu-HU-NoemiNeural
    hu-HU-TamasNeural

Indonesian voices:
    id-ID-ArdiNeural
    id-ID-GadisNeural

Italian voices:
    it-IT-GiuseppeNeural
    it-IT-DiegoNeural
    it-IT-ElsaNeural
    it-IT-IsabellaNeural

Kazakh voices:
    kk-KZ-AigulNeural
    kk-KZ-DauletNeural
    
Malay voices:
    ms-MY-OsmanNeural
    ms-MY-YasminNeural

Polish voices:
    pl-PL-MarekNeural
    pl-PL-ZofiaNeural

Portuguese voices:
    pt-BR-ThalitaNeural
    pt-BR-AntonioNeural
    pt-BR-FranciscaNeural
    pt-PT-DuarteNeural
    pt-PT-RaquelNeural

Russian voices:
    ru-RU-DmitryNeural
    ru-RU-SvetlanaNeural

Swahili voices:
    sw-KE-RafikiNeural
    sw-KE-ZuriNeural
    sw-TZ-DaudiNeural
    sw-TZ-RehemaNeural

Thai voices:
    th-TH-NiwatNeural
    th-TH-PremwadeeNeural

Turkish voices:
    tr-TR-AhmetNeural
    tr-TR-EmelNeural

Ukrainian voices:
    uk-UA-OstapNeural
    uk-UA-PolinaNeural

Vietnamese voices:
    vi-VN-HoaiMyNeural
    vi-VN-NamMinhNeural

Using in OpenAI SDK

You need to install the openai library pip install openai

python
from openai import OpenAI

client = OpenAI(api_key='12314', base_url='http://your_ip:7899/v1')
with  client.audio.speech.with_streaming_response.create(
                    model='tts-1',
                    voice='zh-CN-YunxiNeural',
                    input='Hello, dear friends',
                    speed=1.0
                ) as response:
    with open('./test.mp3', 'wb') as f:
       for chunk in response.iter_bytes():
            f.write(chunk)

Directly using requests to call the API

python
import requests
res=requests.post('http://your_ip:7899/v1',data={"voice":"zh-CN-YunxiNeural",
                    "input":"Hello, dear friends",
                    speed=1.0 })
with open('./test.mp3', 'wb') as f:
    f.write(res.content)