<- Launch Binder or share the Binder link
Image classification of the stanford-cars dataset leveraging the fastai v1. The goal is to try hit 90%+ accuracy shoot for the stars, starting with a basic fastai image classification workflow and interating from there.
This was all run on a Paperspace P4000 machine apart from the EfficientNet-b7 results which were run on a P6000.
SOTA = 96.2%, Domain Adaptive Transfer Learning with Specialist Models
- NOTEBOOK: 10_stanford_cars_EfficientNet_b7_Ranger_Mish_Trial.ipynb
- Continuing on my EfficinetNet-b3 result of 93.8% I matched the EfficinetNet paper's b7 result
- Achieved 94.79% (standard deviation of 0.094) 5-run, 40epoch, mean test set accuracy on Stanford Cars using Mish EfficientNet-b7 + Ranger
- Matched the EfficientNet paper EfficientNet-b7 result of 94.7% (current SOTA is 96.0%)
- Used MEfficientNet-b3, created by swapping the Squish activation function for the Mish activation function
- Used the Ranger optimisation function (a combination of RAdam and Lookahead) and trained with FlatCosAnnealScheduler
- Huge credit for this work goest to this inspirational fastai thread, credit to @lukemelas for the pytorch implementation and all the fastai community especially @lessw2020 for Ranger, @digantamisra98 for Mish and @muellerzr, @grankin and everyone else there for making valuable contributions
10_stanford_cars_EfficientNet_b7_Ranger_Mish_Trial.ipynb
- Achieved 94.79% 5-run, 40epoch, mean test set accuracy on Stanford Cars using Mish EfficientNet-b7 + Ranger
- Using the Mish activation and Ranger with EfficientNet-b7. See notebook for implementation details
- Further discussion of this results can be found on the fastai forums
- Full accuracy and validation loss results for each run in the results excel file
exp_stanford_cars_EfficientNet_Mish Range913A.ipynb
- Ran 8 experiments testing Beta version of new Range913A from @less20202
- Matched previous best using "vanialla" Ranger (93.8%), needed a higher lr to match (1e2 vs 1.5e-3)
- Notebook for full results and plots of validation loss and accuracy
9_stanford_cars_EfficientNet_Ranger_Mish_Trial.ipynb
- Achieved 93.8% 5-run, 40epoch, mean test set accuracy on Stanford Cars using Mish EfficientNet-b3 + Ranger
- Using the Mish activation and Ranger with EfficientNet-b3. See notebook for implementation details
- Validation loss and Accuracy (I used the test set as the validation set) are saved in mefficientnet_b3_ranger_results.xlsx
- Fastai Forums post and discussion
- Quick Medium post
6_stanford_cars_cutout.ipynb
- Used the Cuout data augmentation alongside default fastai data transforms, size of the squares were 25% of the image side (e.g. 25% x 224)
- 88.3% Accuracy achieved
5_stanford_cars_mixup_and_dropout.ipynb
- Tuning the dropout parameters while also using the Mixup) protocol
- 89.2% Accuracy achieved with agressive dropout (ps = [0.35, 0.7]), accuracy more or less the same as NB4
4_stanford_cars_mixup.ipynb
- Tuning the model using the Mixup) protocol, blending input images to provide stronger regularisation
- 89.4% Accuracy, up 1% since NB2
3_stanford_cars_cropped.ipynb
- Training the model using the cropped images, based on the bounding boxes provided
- 78.54% Accuracy, down 9.5% from Notebook 2
2_stanford_cars_lr_tuning.ipynb
- Tuning of the learning rate and differential learning rates, again using fastai's implementation of the 1-cycle policy
- 88.19% Accuracy, up 3.2%
1_stanford_cars_basic.ipynb
- Benchmark model using basic fastai image classification workflow including the 1-cycle policy
- 84.95% Accuracy
labels_df.csv contains the labels, filepath and test/train flag for each image file.
- 95% - WS-DAN - See Better Before Looking Closer: Weakly Supervised Data Augmentation Network for Fine-Grained Visual Classification - Hu 2019. Code might not be released until October 2019 if it is accepted for ICCV-2019.
- Previous SOTA - 93.61% (Apr-18) https://www.researchgate.net/publication/316027349_Deep_CNNs_With_Spatially_Weighted_Pooling_for_Fine-Grained_Car_Recognition
Fine tune first on Cars from Google Open Images
Use DAT Domain Adaptive Transfer Learning with Specialist Models_
FORNAX - Great roundup in advances in 2018, some of which can be applied: https://github.com/kmkolasinski/deep-learning-notes/blob/master/seminars/2018-12-Improving-DL-with-tricks/Improving_deep_learning_models_with_bag_of_tricks.pdf
AMAZON - Bag of Tricks for Image Classification with Convolutional Neural Network: https://arxiv.org/pdf/1812.01187.pdf
Multi-Attention CNN: https://github.com/Jianlong-Fu/Multi-Attention-CNN
- Visualise with images in t-sne: https://github.com/kheyer/ML-DL-Projects
- Great visualisaton here for the transforms available in fastai
- Train only on cropped images
- Use Mixup
- Mixup + Dropout (produced good results in Mixup paper)
- AdaMixup (https://arxiv.org/abs/1809.02499v3)
- Cutout - Improved Regularization of Convolutional Neural Networks with Cutout https://arxiv.org/pdf/1708.04552.pdf
- RGB Transforms, which I tested here
- Label Smoothing (https://arxiv.org/abs/1512.00567)
- Mixup + Label smoothing (tested in Mixup paper) - maybe not, didn't produce great results
- Random Erasing (Zhong et al., 2017)
- Try increase zoom and higher resolution images
- Use own stats (mean+std dev) from training set to normalize the images
- Use non-standard fastai image augmentations, including augmentations for this dataset can be found here: http://ee.sharif.edu/~shayan_f/fgcc/index.html
- Agressive LR for training all layers
- Adding Weight-Decay and tuning Dropout
- Batch Norm
- @sgugger's adam experiments: https://github.com/sgugger/Adam-experiments
- AdamW with 1-cycle: https://twitter.com/radekosmulski/status/1014964816337952770?s=12
- AdamW and other DL tricks: https://twitter.com/drsxr/status/1073208269907353602?s=12
- train with bn_freeze=true for unfrozen layers
- Shake-Shake Regularisation (mentioned in Fornax slides above)
- Knowledge Distillation (paper: https://arxiv.org/abs/1503.02531, https://forums.fast.ai/t/part-1-complete-collection-of-video-timelines/5504)
- Kaggle winner, Mixup + Knowledge Distillation (http://arxiv.org/abs/1809.04403v2)
- Try alternate resnet sizes (benchmark used resnet152)
- Try alternate archs, e.g. densenet, unet
- Try XResnet152
- code to extract the labels and annotations from the .mat files: Devon Yates' code on Kaggle, thanks Devon! https://www.kaggle.com/criticalmassacre/inaccurate-labels-in-stanford-cars-data-set
*(My 90%+ goal was based on @sgugger's code implementing Adam for the Stanford Cars dataset, here: https://github.com/sgugger/Adam-experiments)