My own easy-to-understand implementation of the paper Jansson et al., "Singing Voice Separation with Deep U-Net Convolutional Networks" using PyTorch and librosa.
-
Put audio files with instrument-only track on the left channel and mixed (with vocal) track on the right channel into the
datadirectory. -
Run train.py
- Specify input media in
inference.py. - Run
inference.py - The result will be saved as
result.wav.