In audio-to-text tasks, background noise or accompaniment can affect recognition accuracy. To achieve more precise results, it's necessary to remove background accompaniment from the audio beforehand.
2 Recommended Tools for Vocal and Background Separation
First, vocal-separate: A local offline tool for vocal and background separation based on spleeter. Pre-packaged versions are available for Windows—just extract and double-click to use. For Mac/Linux, source code deployment is required. It features a Chinese interface, is very simple to use, supports direct video processing, and is relatively fast.

Second, Ultimate Vocal Remover: This is the desktop GUI version of uvr5. On Windows, it must be installed on the C drive to avoid issues. It has an English interface with more options, making it slightly more complex to operate, but it offers stronger functionality and better results.

Installation and Usage of vocal-separate
1. For Windows, first download the pre-packaged version here. For other systems, pull the source code and deploy it. https://github.com/jianchang512/vocal-separate/releases

2. After downloading, extract the files and double-click start.exe. Wait for the browser page to open automatically. If you see an error similar to the one below, don't worry—it's just a reminder that GPU acceleration is unavailable and doesn't affect usage.

Once successfully started, the following browser page will open:

3. As shown above, drag and drop or click to upload the audio or video file from which you want to extract vocals. Videos are automatically converted to audio before processing.
Select the "2stems" model to separate the uploaded file into two files: vocals and other sounds.
You can also choose the 4stems or 5stems models, which further separate other sounds into files like "drums" and "bass." Generally, using 2stems is sufficient.

You can preview the separation results on the webpage. Click to download or go directly to the displayed output directory to find the separated files. The vocal file is named vocals.wav, and the other sounds file is named accompaniment.wav.

It's that simple.
Installation and Usage of Ultimate Vocal Remover
1. First, go to https://github.com/Anjok07/ultimatevocalremovergui/releases/tag/v5.6 to download.

For Windows, you can also download directly via this link. After downloading, double-click the exe file and follow the installation steps to complete. https://github.com/Anjok07/ultimatevocalremovergui/releases/download/v5.6/UVR_v5.6.0_setup.exe
2. After installation, double-click the desktop icon to launch.

3. As shown below, select the audio file to process, set the output directory, and choose the model, bitrate, and other options. Only "Select Input" and "Select Output" are required; the rest can be left as default.

"Select Input": Click to choose the audio file to process.
"Select Output": Click to choose where to save the processed files.
"CHOOSE PROCESS METHODS": Select the processing method; the default is MDX-Net, which generally offers the best results, so it's fine to keep it as default.

"CHOOSE MDX-NET MODEL": Choose the model corresponding to the selected method. If not using "MDX-Net," additional models may need to be downloaded.


"Start Processing": The button to start the separation process after all selections are made. Click it to begin, and wait for the completion prompt.

