-
3D encoder and 2D/1D encoder
-
1/2 or 1/4 resolution prediction
-
very simple decoder
-
- split the 2nd fragment into two image vertically: 4 folds
- 3D CNN encoder and 2D Encoder
- encoder: based on @samfc10 's Notebook: Vesuvius Challenge - 3D ResNet Training
- remove maxpooling after the 1st CNN
- use attention before reduce D-dim
- use resnet18 and resnet34
- decoder
- a single 2D CNN layer
- upsample with a nearest interpolation
- output resolution is downsampled to 1/2. then upsample with a bilinear interpolation
- encoder: based on @samfc10 's Notebook: Vesuvius Challenge - 3D ResNet Training
- 3D transformer encoder and linear decoder
- encoder: use MViTv2-s of the PyTorch official implementation and a pre-trained model
- modify forward function to get each scale output
- replace MaxPool3d into MaxPool2d for the reproducibility
- decoder: a single linear and patch expanding to upscale low resolutions 3D images
- patch expanding is from Swin-Unet
- output resolution is downsampled to 1/2 or 1/4, then upsample with a bilinear interpolation
- encoder: use MViTv2-s of the PyTorch official implementation and a pre-trained model
- amp
- torch.compile
- label smoothing
- cutout
- cutmix
- mixup
- data augmentation
- referred @ts 's notebook: 2.5d segmentaion baseline [training]
- referred @tattaka and @mipypf 's
- patch_size=224 and stride=112
- stride=75 or stride=56 didn't work
- fp16 inference
- stride=75
- better than stride=112
- ignore edge of output prediction
- use only the red area of prediction (figure below)