In audio-to-text transcription tasks, background noise or accompaniment can impact recognition accuracy. To achieve more precise results, it's necessary to pre-remove background audio or music from your recordings.
2 Recommended Tools for Vocal and Background Audio Separation
1. vocal-separate: A local, offline vocal and background audio separation tool based on spleeter. A pre-packaged version is available for Windows, which can be used by simply extracting and double-clicking. For Mac/Linux, source code deployment is required. It features a Chinese interface, is very easy to use, supports direct video processing, and offers fast performance.
2. Ultimate Vocal Remover: This is the desktop GUI version of UVR5. On Windows, it needs to be installed on the C drive; otherwise, issues may arise. It has an English interface with many options, making operation relatively more complex, but it offers more powerful features and better separation results.
vocal-separate Installation and Usage
1. For Windows, first download the pre-packaged version from here. For other systems, deploy from source. https://github.com/jianchang512/vocal-separate/releases
2. After downloading, extract the files and double-click start.exe
. Wait for the browser page to open automatically. If an error similar to the one below appears, don't worry; it merely indicates that GPU acceleration is unavailable and does not affect functionality.
Upon successful launch, the following browser page will open:
3. As shown in the image above, drag and drop or click to upload the audio or video file from which you want to separate vocals. Videos will automatically be converted to audio upon upload before processing.
From the models, select "2stems" to separate the uploaded file into two files: vocals and other sounds.
You can also choose "4stems" and "5stems" models, which further subdivide other sounds into files like "drums," "bass," etc., in addition to separating vocals. In most cases, using "2stems" is sufficient.
You can preview the separation results on the webpage. Click download or navigate directly to the displayed output directory to find the separated files. The vocal file will be named vocals.wav, and the other sounds file will be named accompaniment.wav.
It's that simple.
Ultimate Vocal Remover Installation and Usage
1. First, download from here: https://github.com/Anjok07/ultimatevocalremovergui/releases/tag/v5.6
The Windows version can also be downloaded directly via this link. After downloading, double-click the .exe file and click 'next' all the way through to complete the installation. https://github.com/Anjok07/ultimatevocalremovergui/releases/download/v5.6/UVR_v5.6.0_setup.exe
2. After installation, double-click the desktop icon to launch.
3. As shown below, select the audio file to process, set the output directory, choose the processing model, bitrate, and other options. Apart from "Select Input" and "Select Output," all other options are optional and can be left at their defaults.
"Select Input": Click to choose the audio file you want to process.
"Select Output": Click to choose where to save the processed files.
"CHOOSE PROCESS METHODS": Select the processing method. MDX-Net is the default and generally offers the best results; it's recommended to keep it as default.
"CHOOSE MDX-NET MODEL": This selects the model corresponding to the chosen processing method. If you're not using the 'MDX-Net' method, additional models may need to be downloaded.
"Start Processing": This is the execution button after making your selections. Click it to begin the separation operation and wait for the completion prompt.