Skip to content

In audio-to-text tasks, background noise or accompaniment can affect recognition accuracy. To achieve more precise results, it's necessary to remove background music from the audio beforehand.

1. vocal-separate: A local offline vocal-background music separation tool based on spleeter. It has a pre-packaged version for Windows, which can be used by simply unzipping and double-clicking. For Mac/Linux, source code deployment is required. It has a Chinese interface, is very easy to use, supports direct processing of videos, and is relatively fast.

2. Ultimate Vocal Remover: This is the desktop GUI version of uvr5. On Windows, it needs to be installed on the C drive, otherwise problems may occur. It has an English interface, more options, and is relatively complex to operate, but it is more powerful and has better results.


vocal-separate Installation and Usage

1. For Windows, first download the pre-packaged version here. For other systems, pull the source code for deployment. https://github.com/jianchang512/vocal-separate/releases

2. After downloading, unzip it and double-click start.exe. If you see an error message similar to the one below, don't worry, this is just a reminder that GPU acceleration is not available, which does not affect usage.

After successful startup, the following browser page will open:

3. As shown in the figure above, drag or click to upload the audio or video you want to separate the vocals from. Videos will be automatically converted to audio and then processed after uploading.

Select "2stems" from the models to separate the uploaded file into two files: vocals and other sounds.

Of course, you can also choose the 4stems and 5stems models. In addition to separating vocals, they will also subdivide other sounds into files such as "drums" and "bass". In general, only 2stems is needed.

You can listen to the separation results on the webpage, click to download, or directly find the separated files in the displayed separation results directory. The vocal file name is vocals.wav, and the other sound file name is accompaniment.wav.

It's that simple.


Ultimate Vocal Remover Installation and Usage

1. First go here https://github.com/Anjok07/ultimatevocalremovergui/releases/tag/v5.6 to download

For the Windows version, you can also click this link to download directly. After downloading, double-click the exe file and click next all the way to complete the installation. https://github.com/Anjok07/ultimatevocalremovergui/releases/download/v5.6/UVR_v5.6.0_setup.exe

2. After the installation is complete, double-click the desktop icon to start.

3. As shown in the figure below, select the audio file to be processed, set the output result directory, select the model to be processed, bit rate and other options. Except for "Select Input" and "Select Output", all others are optional and can be kept as default.

"Select Input": Click it to select the audio file to be processed.

"Select Output": Click it to select where to save the processed file.

"CHOOSE PROCESS MEHTODS": Select the processing method, the default is MDX-Net, this effect should be the best, keep the default.

"CHOOSE MDX-NET MODEL": The model to be used corresponding to the above method. If it is not the "MDX-Net" method, you need to download the model separately.

Start Processing”: The start execution button after the selection is complete, click it to start the separation operation, wait for the prompt to complete.