연도 | 논문 | 내용 |
---|---|---|
Vision | ||
2014 | VAE (Kingma and Welling) | [✓] Training on MNIST [✓] Visualizing Encoder output [✓] Visualizing Decoder output [✓] Reconstructing image |
2015 | CAM (Zhou et al.) | [✓] Applying GoogLeNet [✓] Generating 'Class Activatio Map' [✓] Generating bounding box |
2016 | Gatys et al. | [✓] Experimenting on input image size [✓] Experimenting on VGGNet-19 with Batch normalization [✓] Applying VGGNet-19 |
YOLO (Redmon et al.) | [✓] Model architecture [✓] Visualizing ground truth on grid [✓] Visualizing model output [✓] Visualizing class probability map [ㅤ] Loss function [ㅤ] Training on VOC 2012 |
|
DCGAN (Radford et al.) | [✓] Training on CelebA at 64 × 64 [✓] Sampling [✓] Interpolating in latent space [ㅤ] Training on CelebA at 32 × 32 |
|
Noroozi et al. | [✓] Model architecture [✓] Chromatic aberration [✓] Permutation set |
|
Zhang et al. | [✓] Visualizing empirical probability distribution [ㅤ] Model architecture [ㅤ] Loss function [ㅤ] Training |
|
2014 2017 |
Conditional GAN (Mirza et al.) WGAN-GP (Gulrajani et al.) |
[✓] Training on MNIST |
2016 2017 |
VQ-VAE (Oord et al.) PixelCNN (Oord et al.) |
[✓] Training on Fashion MNIST [✓] Training on CIFAR-10 [✓] Sampling |
2017 | Pix2Pix (Isola et al.) | [✓] Experimenting on image mean and std [✓] Experimenting on nn.InstanceNorm2d() [✓] Training on Google Maps [✓] Training on Facades [ㅤ] higher resolution input image |
CycleGAN (Zhu et al.) | [✓] Experimenting on random image pairing [✓] Experimenting on LSGANs [✓] Training on monet2photo [✓] Training on vangogh2photo [✓] Training on cezanne2photo [✓] Training on ukiyoe2photo [✓] Training on horse2zebra [✓] Training on summer2winter_yosemite |
|
2018 | PGGAN (Karras et al.) | [✓] Experimenting on image mean and std [✓] Training on CelebA-HQ at 512 × 512 [✓] Sampling |
DeepLabv3 (Chen et al.) | [✓] Training on VOC 2012 [✓] Predicting on VOC 2012 validation set [✓] Average mIoU [✓] Visualizing model output |
|
RotNet (Gidaris et al.) | [✓] Visualizing Attention map | |
StarGAN (Yunjey Choi et al.) | [✓] Model architecture | |
2020 | STEFANN (Roy et al.) | [✓] FANnet architecture [✓] Colornet architecture [✓] Training FANnet on Google Fonts [✓] Custom Google Fonts dataset [✓] Average SSIM [ㅤ] Training Colornet |
DDPM (Ho et al.) | [✓] Training on CelebA at 32 × 32 [✓] Training on CelebA at 64 × 64 [✓] Visualizing denoising process [✓] Sampling using linear interpolation [✓] Sampling using coarse-to-fine interpolation |
|
DDIM (Song et al.) | [✓] Normal sampling [✓] Sampling using spherical linear interpolation [✓] Sampling using grid interpolation [✓] Truncated normal |
|
ViT (Dosovitskiy et al.) | [✓] Training on CIFAR-10 [✓] Training on CIFAR-100 [✓] Visualizing Attention map using Attention Roll-out [✓] Visualizing position embedding similarity [✓] Interpolating position embedding [✓] CutOut [✓] CutMix [✓] Hide-and-Seek |
|
SimCLR (Chen et al.) | [✓] Normalized temperature-scaled cross entropy loss [✓] Data augmentation [✓] Pixel intensity histogram |
|
DETR (Carion et al.) | [✓] Model architecture [ㅤ] Bipartite matching & loss [ㅤ] Batch normalization freezing [ㅤ] Training on COCO 2017 |
|
2021 | Improved DDPM (Nichol and Dhariwal) | [✓] Cosine diffusion schedule |
Classifier-Guidance (Dhariwal and Nichol) | [✓] Training on CIFAR-10 [ㅤ] AdaGN [ㅤ] BiGGAN Upsample/Downsample [ㅤ] Improved DDPM sampling [ㅤ] Conditional/Unconditional models [ㅤ] Super-resolution model [ㅤ] Interpolation |
|
ILVR (Choi et al.) | [✓] Sampling using single reference [✓] Sampling using various downsampling factors [✓] Sampling using various conditioning range |
|
SDEdit (Meng et al.) | [✓] User input stroke simulation [✓] Applying CelebA at 64 × 64 - [ ] Total repeats. |
- VE SDEdit.
- Sampling from scribble.
- Image editing only on masked regions.|
||MAE (He et al.)|[✓] Model architecture for self-supervised pre-training
[✓] Model architecture for classification
[ㅤ] Self-supervised pre-training on ImageNet-1K
[ㅤ] Fine-tuning on ImageNet-1K
[ㅤ] Linear probing| ||Copy-Paste (Ghiasi et al.)|[✓] COCO dataset processing
[✓] Large scale jittering
[✓] Copy-Paste (within mini-batch)
[✓] Visualizing data
[ㅤ] Gaussian filter| ||ViViT (Arnab et al.)|[✓] 'Spatio-temporal attention' architecture
[✓] 'Factorised encoder' architecture
[✓] 'Factorised self-attention' architecture| |2022|CFG (Ho et al.)| |Language| |2017|Transformer (Vaswani et al.)|[✓] Model architecture
[✓] Visualizing position encoding| |2019|BERT (Devlin et al.)|[✓] Model architecture
[✓] Masked language modeling
[✓] BookCorpus data processing
[✓] SQuAD data processing
[✓] SWAG data processing| ||Sentence-BERT (Reimers et al.)|[✓] Classification loss
[✓] Regression loss
[✓] Constrastive loss
[✓] STSb data processing
[✓] WikiSection data processing
[ㅤ] NLI data processing| ||RoBERTa (Liu et al.)|[✓] BookCorpus data processing
[✓] Masked language modeling
[ㅤ] BookCorpus data processing ('SEGMENT-PAIR' + NSP)
[ㅤ] BookCorpus data processing ('SENTENCE-PAIR' + NSP)
[✓] BookCorpus data processing ('FULL-SENTENCES')
[ㅤ] BookCorpus data processing ('DOC-SENTENCES')| |2021|Swin Transformer (Liu et al.)|[✓] Patch partition
[✓] Patch merging
[✓] Relative position bias
[✓] Feature map padding
[✓] Self-attention in non-overlapped windows
[ㅤ] Shifted Window based Self-Attention| |2024|RoPE (Su et al.)|[✓] Rotary Positional Embedding| |Vision-Language| |2021|CLIP (Radford et al.)|[✓] Training on Flickr8k + Flickr30k
[✓] Zero-shot classification on ImageNet1k (mini)
[✓] Linear classification on ImageNet1k (mini)|