CosyVoice Open Source Address: https://github.com/FunAudioLLM/CosyVoice
CosyVoice-api Open Source Address: https://github.com/jianchang512/cosyvoice-api
Supports Chinese, English, Japanese, Korean, Cantonese, with corresponding language codes:
zh|en|jp|ko|yue
Using in Video Translation Software
- First, upgrade the software to version 2.08+.
- Ensure the CosyVoice project is deployed, the api.py from CosyVoice-api is included, and api.py is successfully launched (API service must be running for use in the translation software).
- Open the video translation software, go to Settings (top left) -- CosyVoice: Enter the API address, which defaults to
http://127.0.0.1:9233
. - Fill in the reference audio and corresponding text.
Reference audio format:
Each line is separated into two parts by the # symbol. The first part is the path to the WAV audio file, and the second part is the corresponding text content. Multiple lines can be entered.
The optimal duration for WAV audio is 5-15 seconds. If the audio file is located in the root directory of the CosyVoice project (same directory as webui.py), you can directly enter the file name here.
If it's located in the wavs directory under the root directory, you need to enter wavs/audio_name.wav
Example reference audio entries:
1.wav#Hello dear friends
wavs/2.wav#Hello friends
- After filling in the information, select CosyVoice as the voice-over channel and the corresponding role on the main interface. The clone role duplicates the original video's voice timbre.
For other systems, please deploy CosyVoice first. The specific deployment method is as follows:
Source Code Deployment of the Official CosyVoice Project
Deployment uses conda, which is highly recommended. Otherwise, installation may fail, and you may encounter many issues. Some dependencies cannot be successfully installed using pip on Windows, such as
pynini
.
1. Download and install Miniconda
Miniconda is a conda management software. It is easy to install on Windows; just follow the prompts like a normal software installation.
Download address: https://docs.anaconda.com/miniconda/
After downloading, double-click the .exe file.
Note that in the following interface, you need to select the top two checkboxes; otherwise, subsequent operations will be a bit troublesome. Selecting the second checkbox means "Add conda commands to the system environment variables." If you don't select it, you won't be able to use the conda short commands directly.
Then click "install" and wait for it to complete before closing.
2. Download CosyVoice source code
First, create an empty directory, for example, D:/py on the D drive. The following instructions will use this as an example.
Open the CosyVoice open-source address: https://github.com/FunAudioLLM/CosyVoice
After downloading and extracting, copy all the files in the CosyVoice-main directory to D:/py.
3. Create a virtual environment and activate it
Go to the D:/py folder, enter cmd
in the address bar, and press Enter. This will open a black cmd window.
In the window, enter the command conda create -n cosyvoice python=3.10
and press Enter. This creates a virtual environment named "cosyvoice" with Python version "3.10".
Continue by entering the command conda activate cosyvoice
and press Enter. This activates the virtual environment. Only after activation can you continue with installation, startup, and other operations; otherwise, errors will inevitably occur.
The activated state is indicated by the addition of the "(cosyvoice)" character at the beginning of the command line.
4. Install the pynini
module
This module can only be installed using the conda command on Windows, which is why using conda on Windows is recommended at the beginning.
Continue in the cmd window that you opened and activated above, and enter the command conda install -y -c conda-forge pynini==2.1.5 WeTextProcessing==1.0.3
and press Enter.
Note: During installation, a prompt will appear asking for confirmation. Enter y
and press Enter, as shown below.
5. Install other dependencies using the Alibaba Cloud mirror
Open the
requirements.txt
file, delete the last lineWeTextProcessing==1.0.3
, otherwise, the installation will definitely fail because this module depends onpynini
, and pynini cannot be installed under pip on Windows.Then add 3 lines
Matcha-TTS
flask
andwaitress
to requirements.txt
Continue by entering the command
pip install -r requirements.txt -i https://mirrors.aliyun.com/pypi/simple/ --trusted-host=mirrors.aliyun.com
And press Enter. After a long wait, the installation should be successful.
6. Download the api.py file and place it in the project
Go to this address https://github.com/jianchang512/cosyvoice-api/blob/main/api.py to download the api.py file, and place it together with webui.py.
Start the API service
The API interface address is:
http://127.0.0.1:9233
Enter the command python api.py
and press Enter to execute.
API Interface List
Synthesize text based on built-in roles
Interface address: /tts
Simply synthesizes text into speech without voice cloning.
Required parameters:
text
: The text to be synthesized into speech.
role
: Select one of '中文女', '中文男', '日语男', '粤语女', '英文女', '英文男', '韩语女' (Chinese Female, Chinese Male, Japanese Male, Cantonese Female, English Female, English Male, Korean Female).
Successful return: WAV audio data.
Example code
data={
"text":"Hello dear friends",
"reference_audio":"10.wav"
}
response=requests.post(f'http://127.0.0.1:9933/tts',data=data,timeout=3600)
Clone voice timbre in the same language
- Address: /clone_eq
The pronunciation language of the reference audio is consistent with the language of the text to be synthesized. For example, if the reference audio is Chinese pronunciation, and you need to synthesize Chinese text into speech based on that audio.
- Required parameters:
text
: The text to be synthesized into speech.
reference_audio
: The reference audio for voice cloning.
reference_text
: The text corresponding to the reference audio. The path to the reference audio relative to api.py. For example, if referencing 1.wav, and the file is in the same folder as api.py, then enter 1.wav
.
Successful return: WAV data.
Example code
data={
"text":"Hello dear friends.",
"reference_audio":"10.wav",
"reference_text":"I hope you are doing better than me."
}
response=requests.post(f'http://127.0.0.1:9933/tts',data=data,timeout=3600)
Clone voice timbre in different languages:
- Address: /cone
The pronunciation language of the reference audio is inconsistent with the language of the text to be synthesized. For example, you need to synthesize an English text into speech based on a reference audio with Chinese pronunciation.
- Required parameters:
text
: The text to be synthesized into speech.
reference_audio
: The reference audio for voice cloning. The path to the reference audio relative to api.py. For example, if referencing 1.wav, and the file is in the same folder as api.py, then enter 1.wav
.
Successful return: WAV data.
Example code
data={
"text":"親友からの誕生日プレゼントを遠くから受け取り、思いがけないサプライズと深い祝福に、私の心は甘い喜びで満たされた!。",
"reference_audio":"10.wav"
}
response=requests.post(f'http://127.0.0.1:9933/tts',data=data,timeout=3600)
OpenAI TTS Compatibility
- Interface address: /v1/audio/speech
- Request method: POST
- Request type: Content-Type: application/json
- Request parameters:
input
: The text to be synthesized.model
: Fixed to tts-1, compatible with OpenAI parameters, but not actually used.speed
: Speech rate, default is 1.0.reponse_format
: Return format, fixed to WAV audio data.voice
: Only used for text synthesis, select one of '中文女', '中文男', '日语男', '粤语女', '英文女', '英文男', '韩语女' (Chinese Female, Chinese Male, Japanese Male, Cantonese Female, English Female, English Male, Korean Female).
When cloning, fill in the path to the reference audio relative to api.py. For example, if referencing 1.wav, and the file is in the same folder as api.py, then enter
1.wav
.
- Example code
from openai import OpenAI
client = OpenAI(api_key='12314', base_url='http://127.0.0.1:9933/v1')
with client.audio.speech.with_streaming_response.create(
model='tts-1',
voice='中文女',
input='Hello dear friends',
speed=1.0
) as response:
with open('./test.wav', 'wb') as f:
for chunk in response.iter_bytes():
f.write(chunk)