Easily Convert Eastern Languages (Dialects and Minority Languages) Audio/Video to Subtitles - Dolphin Integration Package | pyVideoTrans官网-开源免费视频翻译配音软件 pyvideotrans.com pyvideotrans github github.com/jianchang512/pyvideotrans

Have you ever faced this frustration?

Many speech-to-text tools work well with English but perform poorly with Eastern languages like Chinese dialects (Cantonese, Sichuanese, etc.), Vietnamese, Filipino, and others.

Great news is here!

The Dataocean AI team has developed and open-sourced the Dolphin project, a speech transcription model optimized specifically for Eastern languages, providing more accurate recognition.

To make this powerful tool accessible even for non-technical users, I've created a user-friendly interface and a one-click integration package.

Download Links

• Method 1: Download from Baidu Netdisk: https://pan.baidu.com/s/1ODhqN-GiaHoGdU-ml3kCUQ?pwd=i2ui
• GitHub Address: https://github.com/jianchang512/speech2text-df

Key Features: Simple and Efficient

• Focus on Eastern Languages: Specially optimized to support various Eastern languages and dialects.
• Easy to Use: Just upload your audio/video, select the language, and click a button.
• Flexible Output: Defaults to generating SRT subtitle files, with support for TXT text or JSON format.

How to Use? (Graphical Interface Version)

Follow these simple steps to get started:

1. Launch the Tool
- • After running the program, it will automatically open a web interface in your browser, usually at http://127.0.0.1:5080. If it doesn't open automatically, just enter this address manually.
1. Upload Audio or Video File
- • Click the "Select File" button on the interface and locate the audio or video file you want to transcribe.
- • Supports multiple formats: mp3, mp4, mpeg, mpga, m4a, wav, webm, aac, flac, mov, mkv, avi, etc.
1. Select Language
- • In the "Language Selection" dropdown, find the language of your file (e.g., Mandarin Chinese, Sichuanese, Cantonese, etc.).
- • Not sure what language it is? No problem, select "Auto Detect" to let the tool figure it out.
1. Select Output Format
- • By default, it generates an SRT subtitle file.
- • You can also choose to output TXT (plain text) or JSON (structured data) as needed.
1. Start Transcription
- • Click the "Start Transcription" button.
- • The tool will automatically perform a series of processes in the background:
- - • Convert your file to WAV audio format suitable for processing.
  - • Split the audio into small segments to improve processing speed and accuracy.
  - • Use the Dolphin model to recognize speech in each segment.
  - • Finally, organize the recognition results into your chosen format (e.g., SRT).
1. Get Results
- • Once transcription is complete, the results will be displayed directly on the interface.
- • You can directly copy the text or click the download button to save the result as a file for use in video editing or other applications.

For Developers: API Usage Method

If you're a developer and want to integrate this functionality into your own program, the integration package also provides an API interface.

• Endpoint: /v1/audio/transcriptions
• Method: POST
• Content-Type: multipart/form-data (Note: Not application/json, as files need to be uploaded)
• Request Parameters:
- • file: (Required) The audio/video file itself.
- • language: (Optional) Target language code (see table below). Leave empty for auto-detection.
- • response_format: (Optional) Return format, supports "srt", "json", "txt". Defaults to "srt".
• Response:
- • Success: Returns the transcribed text in the specified format (SRT, JSON, or TXT).
- • Failure: Returns a JSON object containing error information.

Supported Language Codes

Language Code	Language Name (Chinese)
zh-CN	中文(普通话)
zh-TW	中文(台湾)
zh-WU	中文(吴语)
zh-SICHUAN	中文(四川话)
zh-SHANXI	中文(山西话)
zh-ANHUI	中文(安徽话)
zh-TIANJIN	中文(天津话)
zh-NINGXIA	中文(宁夏话)
zh-SHAANXI	中文(陕西话)
zh-HEBEI	中文(河北话)
zh-SHANDONG	中文(山东话)
zh-GUANGDONG	中文(广东话)
zh-SHANGHAI	中文(上海话)
zh-HUBEI	中文(湖北话)
zh-LIAONING	中文(辽宁话)
zh-GANSU	中文(甘肃话)
zh-FUJIAN	中文(福建话)
zh-HUNAN	中文(湖南话)
zh-HENAN	中文(河南话)
zh-YUNNAN	中文(云南话)
zh-MINNAN	中文(闽南语)
zh-WENZHOU	中文(温州话)
ja-JP	日语
th-TH	泰语
ru-RU	俄语
ko-KR	韩语
id-ID	印度尼西亚语
vi-VN	越南语
ct-NULL	粤语(未知)
ct-HK	粤语(香港)
ct-GZ	粤语(广东)
hi-IN	印地语
ur-IN	乌尔都语(印度)
ur-PK	乌尔都语
ms-MY	马来语
uz-UZ	乌兹别克语
ar-MA	阿拉伯语(摩洛哥)
ar-GLA	阿拉伯语
ar-SA	阿拉伯语(沙特)
ar-EG	阿拉伯语(埃及)
ar-KW	阿拉伯语(科威特)
ar-LY	阿拉伯语(利比亚)
ar-JO	阿拉伯语(约旦)
ar-AE	阿拉伯语(阿联酋)
ar-LVT	阿拉伯语(黎凡特)
fa-IR	波斯语
bn-BD	孟加拉语
ta-SG	泰米尔语(新加坡)
ta-LK	泰米尔语(斯里兰卡)
ta-IN	泰米尔语(印度)
ta-MY	泰米尔语(马来西亚)
te-IN	泰卢固语
ug-NULL	维吾尔语
ug-CN	维吾尔语
gu-IN	古吉拉特语
my-MM	缅甸语
tl-PH	塔加洛语
kk-KZ	哈萨克语
or-IN	奥里亚语
ne-NP	尼泊尔语
mn-MN	蒙古语
km-KH	高棉语
jv-ID	爪哇语
lo-LA	老挝语
si-LK	僧伽罗语
fil-PH	菲律宾语
ps-AF	普什图语
pa-IN	旁遮普语
kab-NULL	卡拜尔语
ba-NULL	巴什基尔语
ks-IN	克什米尔语
tg-TJ	塔吉克语
su-ID	巽他语
mr-IN	马拉地语
ky-KG	吉尔吉斯语
az-AZ	阿塞拜疆语

API Call Example (Using curl)

curl -X POST http://127.0.0.1:5080/v1/audio/transcriptions \
  -F "file=@/your/path/your_audio.mp3" \
  -F "language=zh-CN" \
  -F "response_format=srt"

API Call Example (Using Python openai Library)
(This library can conveniently call interfaces compatible with the OpenAI API format)

from openai import OpenAI

# Configure the client to point to the local service address
client = OpenAI(base_url='http://127.0.0.1:5080/v1', api_key='any string will do') # api_key is not critical in this scenario

audio_file_path = "your_audio.wav" # Replace with your file path

with open(audio_file_path, 'rb') as file_handle:
    # Initiate the transcription request
    transcript = client.audio.transcriptions.create(
        file=(audio_file_path, file_handle), # Pass filename and file content
        model='base', # Model name, fixed as 'base' or adjust based on actual situation
        language='zh-CN', # Specify language
        response_format="srt" # Specify return format
    )
    # Print the transcription result (SRT format text)
    print(transcript)

Response Example (SRT Format)

1
00:00:00,000 --> 00:00:02,500
Hello, this is a test audio.

2
00:00:02,500 --> 00:00:05,000
Hope the transcription result is accurate.

Want It Faster? Enable GPU Acceleration (Optional)

• Why Use GPU? If you have a suitable NVIDIA graphics card and the environment configured, using a GPU can significantly speed up transcription, especially for long audio files.
• How to Enable?
1. 1. Prerequisite: Ensure your computer has the correct NVIDIA graphics drivers and CUDA 12.x environment installed.
2. 1. Install Support: In the integration package folder, find and double-click the Install GPU Support.bat file; it will automatically complete the relevant setup.
• Note: The default integration package does not include GPU support to keep the file size small.

A Few Tips

1. File Size and Duration: It's recommended that a single file not be too large (e.g., not exceeding 1GB), and the duration is best kept within 1 hour. Very large files may process very slowly.
1. Audio Quality: The clearer the audio and the less background noise, the better the transcription results. Try to use high-quality audio sources.
1. Internet Required for First Use: The first time you transcribe a particular language, the program needs to download some required data for that language. It's recommended to successfully transcribe all commonly used languages at least once (even with a very short test audio), after which it can be used offline.

Download Links ​

Key Features: Simple and Efficient ​

How to Use? (Graphical Interface Version) ​

For Developers: API Usage Method ​

Supported Language Codes ​

Want It Faster? Enable GPU Acceleration (Optional) ​