Why Denoise?
In many voice-related applications, the presence of noise can severely affect performance and user experience. For example:
- Speech Recognition: Noise reduces the accuracy of speech recognition, especially in low signal-to-noise ratio environments.
- Voice Cloning: Noise can degrade the naturalness and clarity of speech synthesized based on reference audio.
Voice denoising can solve these problems to some extent.
Common Denoising Methods
Currently, there are several main methods for voice denoising:
- Spectral Subtraction: This is a classic denoising method with a simple principle.
- Wiener Filtering: This method works well for stable noise, but its effectiveness is limited for changing noise.
- Deep Learning: This is currently the most advanced denoising method. It utilizes powerful deep learning models, such as recurrent neural networks (RNNs), convolutional neural networks (CNNs), and generative adversarial networks (GANs), to learn the complex relationship between noise and speech, achieving more accurate and natural denoising effects.
ZipEnhancer Model: Deep Learning Denoising
This tool is based on the ZipEnhancer model open-sourced by Tongyi Laboratory and provides an easy-to-use interface and API interface, allowing everyone to easily experience the charm of deep learning denoising.
The project has been open-sourced on GitHub
The core of the ZipEnhancer model is the Transformer network structure and multi-task learning strategy. It can not only remove noise but also enhance speech quality and eliminate echo at the same time. The working principle is as follows:
- Self-Attention Mechanism: Captures important long-term relationships in speech signals and understands the contextual information of the sound.
- Multi-Head Attention Mechanism: Analyzes speech features from different angles to achieve more precise noise suppression and speech enhancement.
How to Use This Tool?
Windows Pre-Packaged Version:
- Download and unzip the pre-packaged version (https://github.com/jianchang512/remove-noise/releases/download/v0.1/win-remove-noise-0.1.7z).
- Double-click the
runapi.bat
file, and the browser will automatically openhttp://127.0.0.1:5080
. - Select an audio or video file to start denoising.
Source Code Deployment:
Environment Preparation: Ensure that Python 3.10 - 3.12 is installed.
Install Dependencies: Run
pip install -r requirements.txt --no-deps
.CUDA Acceleration (Optional): If you have an NVIDIA graphics card, you can install CUDA 12.1 to accelerate processing:
bashpip uninstall -y torch torchaudio torchvision pip install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/cu121
Run the Program: Run
python api.py
.
Linux System:
- You need to install the
libsndfile
library:sudo apt-get update && sudo apt-get install libsndfile1
. - Note: Please ensure that the version of the
datasets
library is 3.0, otherwise errors may occur. You can use thepip list | grep datasets
command to view the version.
Interface Preview
API Usage Method
Interface Address: http://127.0.0.1:5080/api
Request Method: POST
Request Parameters:
stream
: 0 returns the audio URL, 1 returns the audio data.audio
: The audio or video file to be processed.
Return Result (JSON):
- Success (stream=0):
{"code": 0, "data": {"url": "Audio URL"}}
- Success (stream=1): WAV audio data.
- Failure:
{"code": -1, "msg": "Error Message"}
Sample Code (Python): (Optimized based on the original text)
import requests
url = 'http://127.0.0.1:5080/api'
file_path = './300.wav'
# Get the audio URL
try:
res = requests.post(url, data={"stream": 0}, files={"audio": open(file_path, 'rb')})
res.raise_for_status()
print(f"Denoised audio URL: {res.json()['data']['url']}")
except requests.exceptions.RequestException as e:
print(f"Request failed: {e}")
# Get the audio data
try:
res = requests.post(url, data={"stream": 1}, files={"audio": open(file_path, 'rb')})
res.raise_for_status()
with open("ceshi.wav", 'wb') as f:
f.write(res.content)
print("Denoised audio saved as ceshi.wav")
except requests.exceptions.RequestException as e:
print(f"Request failed: {e}")