- Pytorch and our toolkit: kuma_utils(https://github.com/analokmaus/kuma_utils)
- manage experiments using config class (
- Tile based model like Iafoss's kernel (
)- Tile setting is 224 x 224 x 64
- We solve the problem as ordinal regression(Coral loss)
- se-resnext50 is always the best backbone
- public LB: ~0.87
- The biggest challenge is how to deal with noisy labels
- We read and implemented lots of papers (
) - Online Uncertaity Sample Mining (OUSM) worked best (
)- Data with noisy label should give big loss value
- Exclude k samples with biggest loss in a mini batch during training can prevent overfitting to noisy samples
- In order to learn general features, first 5 epochs w/o OUSM
- public LB: ~0.89
- We read and implemented lots of papers (
- Data augmentation has two types: slide aug and tile aug
- Slide aug: ShiftScaleRotate
- Tile aug: ShiftScaleRotate, Flip, Dropout
- public LB: ~0.90
- Ensemble noisy detection
- LB score is very unstable depending on the seed value
- We trained our previous SOTA setting with 10 different seeds, and detected noise labels based on loss
- Noise flags are added to
- Noise flags are added to
- Noise ratio 0.10 + OUSM(k=1) worked the best
- publicLB: ~0.91
- fp16 training on nvidia apex is recommended (https://github.com/NVIDIA/apex)
- Configure
for your own environment - Make sure all requirements in
met - Train with config
python3 train.py --config PatchBinClassification --fp16
Results(checkpoint, oof) will be in results/(config name)/