Urban Sound Classification. For detailed explaination about the dataset and the methods used to clean and compile it please read this paper.
MFCC, Mel Spectrogram, Chroma STFT, Chroma CQT, Chroma Cens, Spectral Contrast, Tonnetz.
- A CNN model with layers of CONV2D ---> MAXPOOL ---> CONV2D ---> MAXPOOL ---> DENSE ---> DENSE ---> SOFTMAX.The first layer of Conv2d uses 64 filters with the dimension of (5*1*1) which is placed on the input shape of (20*5*1). After that a maxpool layer is applied followed by another CONV2D layer and so on. Finally a softmax layer is used at the end to classify between the 10 classes. We have used the adam optimizer which is the most optimized algorithm to calculate the cost.
- A LSTM model with layers of 2 LSTM blocks, 2 time distributed dense layers and finally a output layer for classification. LSTM ---> LSTM ---> DENSE ---> DENSE ---> SOFTMAX. The first and the second lstm layer contains 128 blocks which returns 128 yHat values from the given x values. The values from the LSTM layes are passed onto Dense layer and then to the output layer. Here we use an adam optimizer so that the model converges faster.
- Time Stretch : In this technique, we slow down or speed up the sound clips with a rate of 0.9 and 1.1. In this way, we couldgenerate more 17464 new audio clips for our augmented dataset.
- Pitch Shift : The factors of {-2, +2} are used to raise and lower the pitch (in semitones) of an audio sample in the dataset through which we could create 17464 samples using pitch shift.
- Pitch Shift along with Time Stretch : In this augmentation step, a sound clip is manipulated using both pitch shift and time stress to generate a total of 34928 novel audio clips.
The table below shows the max accuracy of the models on different types of features used.
Model | Accuracy |
---|---|
CNN | 96.5% |
LSTM | 98.81% |
Librosa and ffmpeg
pip install librosa
pip install ffmpeg
For more information - Click here to view our paper published in IEEE
Cite this paper if you use the code and information from the paper 😃