Skip to content

Why Noise Reduction?

In many voice-related applications, the presence of noise can severely impact performance and user experience. For example:

  • Speech Recognition: Noise reduces the accuracy of speech recognition, especially in low signal-to-noise ratio environments.
  • Voice Cloning: Noise reduces the naturalness and clarity of synthesized speech based on reference audio.

Voice noise reduction can solve these problems to some extent.

Common Noise Reduction Methods

Currently, the main voice noise reduction techniques include:

  1. Spectral Subtraction: This is a classic noise reduction method with a simple principle.
  2. Wiener Filtering: This method works well for stable noise but is less effective for fluctuating noise.
  3. Deep Learning: This is currently the most advanced noise reduction method. It leverages powerful deep learning models, such as Recurrent Neural Networks (RNNs), Convolutional Neural Networks (CNNs), and Generative Adversarial Networks (GANs), to learn the complex relationships between noise and speech, achieving more accurate and natural noise reduction.

ZipEnhancer Model: Deep Learning Noise Reduction

This tool is based on the ZipEnhancer model open-sourced by the Tongyi Laboratory. It provides an easy-to-use interface and API, allowing anyone to easily experience the power of deep learning noise reduction.

The project is open source on GitHub

The core of the ZipEnhancer model is the Transformer network structure and multi-task learning strategy. It not only removes noise but also enhances speech quality and eliminates echo. It works as follows:

  • Self-Attention Mechanism: Captures important long-term dependencies in speech signals, understanding the context of the sound.
  • Multi-Head Attention Mechanism: Analyzes speech features from different perspectives, achieving more precise noise suppression and speech enhancement.

How to Use This Tool?

Windows Pre-packaged Version:

  1. Download and extract the pre-packaged version (https://github.com/jianchang512/remove-noise/releases/download/v0.1/win-remove-noise-0.1.7z).
  2. Double-click the runapi.bat file. The browser will automatically open http://127.0.0.1:5080.
  3. Select an audio or video file to start noise reduction.

Source Code Deployment:

  1. Environment Preparation: Ensure that Python 3.10 - 3.12 is installed.
  2. Install Dependencies: Run pip install -r requirements.txt --no-deps.
  3. CUDA Acceleration (Optional): If you have an NVIDIA graphics card, you can install CUDA 12.1 to accelerate processing:
    bash
    pip uninstall -y torch torchaudio torchvision
    pip install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/cu121
  4. Run the Program: Run python api.py.

Linux System:

  • You need to install the libsndfile library: sudo apt-get update && sudo apt-get install libsndfile1.
  • Note: Please ensure that the datasets library version is 3.0, otherwise errors may occur. You can use the pip list | grep datasets command to check the version.

Interface Preview

Interface Preview

API Usage

API Address: http://127.0.0.1:5080/api

Request Method: POST

Request Parameters:

  • stream: 0 returns the audio URL, 1 returns the audio data.
  • audio: The audio or video file to be processed.

Return Results (JSON):

  • Success (stream=0): {"code": 0, "data": {"url": "audio URL"}}
  • Success (stream=1): WAV audio data.
  • Failure: {"code": -1, "msg": "error message"}

Example Code (Python):

python
import requests

url = 'http://127.0.0.1:5080/api'
file_path = './300.wav'


# Get audio URL
try:
  res = requests.post(url, data={"stream": 0}, files={"audio": open(file_path, 'rb')})
  res.raise_for_status() 
  print(f"Noisy reduction audio URL: {res.json()['data']['url']}")

except requests.exceptions.RequestException as e:
    print(f"Request failed: {e}")



# Get audio data
try:
    res = requests.post(url, data={"stream": 1}, files={"audio": open(file_path, 'rb')})
    res.raise_for_status()
    with open("ceshi.wav", 'wb') as f:
        f.write(res.content)
    print("Noise-reduced audio saved as ceshi.wav")

except requests.exceptions.RequestException as e:
    print(f"Request failed: {e}")