ChatTTS has gained popularity, but its documentation is unclear, especially regarding tone, prosody, and speaker control. After repeated testing and troubleshooting, I've gained some insights, which are recorded below.
UI interface code open-source address: https://github.com/jianchang512/chattts-ui
Available Control Symbols in Text
You can insert control symbols into the original text to be synthesized. Currently, laughter and pauses can be controlled.
[laugh] represents laughter
[uv_break] represents a pause
Example text:
text="Hello there[uv_break] friends, I heard today is a good day, isn't[uv_break] it[laugh]?"During synthesis, [laugh] will be replaced with laughter, and a pause will be inserted at [uv_break].
For the intensity of laughter and pauses, you can control it via the params_refine_text parameter by passing a prompt.
laugh_(0-2) options: laugh_0 laugh_1 laugh_2 (laughter becomes more intense or varies?)
break_(0-7) options: break_0 break_1 break_2 break_3 break_4 break_5 break_6 break_7 (pauses become progressively more noticeable or vary?).
Code example:
chat.infer([text], params_refine_text={"prompt":'[oral_2][laugh_0][break_6]'})
chat.infer([text], params_refine_text={"prompt":'[oral_2][laugh_2][break_4]'})However, in actual testing, the differences between [break_0] and [break_7] are not obvious, and similarly for [laugh_0] to [laugh_2].
Skip the Refine Text Stage
During synthesis, the text is refined by inserting control symbols. For example, the above text might be refined to:
Hello there [uv_break] ah [uv_break] um [uv_break] friends, I heard today is a good day, isn't [uv_break] um [uv_break] it [laugh] ? [uv_break]
As you can see, the control symbols may not match your original annotations, leading to unwanted pauses, noise, or laughter in the output. To force synthesis exactly as written, set the skip_refine_text parameter to True to skip the refine text stage.
chat.infer([text], skip_refine_text=True, params_refine_text={"prompt":'[oral_2][laugh_0][break_6]'})
Fix Speaker Voice
By default, each synthesis randomly selects a different voice, which is not user-friendly, and there is no clear documentation on voice selection.
To simply fix the speaker role, first manually set a random seed; different seeds produce different voices.
torch.manual_seed(2222)
Then, sample a random speaker:
rand_spk = chat.sample_random_speaker()
Pass it via the params_infer_code parameter:
chat.infer([text], use_decoder=True, params_infer_code={'spk_emb': rand_spk})
In testing, seeds 2222, 7869, 6653 produce male voices, and 3333, 4099, 5099 produce female voices. Try different seeds to find more roles.
Speech Rate Control
Control the speech rate by setting the prompt in the params_infer_code parameter of chat.infer.
chat.infer([text], use_decoder=True, params_infer_code={'spk_emb': rand_spk, 'prompt':'[speed_5]'})
The available speed values are not listed; the default in the source code is speed_5, but testing speed_0 and speed_7 showed no significant differences.
WebUI Interface and All-in-One Package
Open-source and download address: https://github.com/jianchang512/chatTTS-ui
After extracting the all-in-one package, double-click app.exe.
For source code deployment, follow the repository instructions.
UI Interface Preview



