Skip to content
View KimRass's full-sized avatar
  • -
  • Seoul, Republic of Korea

Block or report KimRass

Block user

Prevent this user from interacting with your repositories and sending you notifications. Learn more about blocking users.

You must be logged in to block users.

Please don't include any personal information such as legal names or email addresses. Maximum 100 characters, markdown supported. This note will be visible to only you.
Report abuse

Contact GitHub support about this user’s behavior. Learn more about reporting abuse.

Report abuse
KimRass/README.md

1. Personal Projects

1) From-scratch PyTorch Implementations of AI papers

연도 논문 내용
Vision
2014 VAE (Kingma and Welling) [✓] Training on MNIST
[✓] Visualizing Encoder output
[✓] Visualizing Decoder output
[✓] Reconstructing image
2015 CAM (Zhou et al.) [✓] Applying GoogLeNet
[✓] Generating 'Class Activatio Map'
[✓] Generating bounding box
2016 Gatys et al. [✓] Experimenting on input image size
[✓] Experimenting on VGGNet-19 with Batch normalization
[✓] Applying VGGNet-19
YOLO (Redmon et al.) [✓] Model architecture
[✓] Visualizing ground truth on grid
[✓] Visualizing model output
[✓] Visualizing class probability map
[ㅤ] Loss function
[ㅤ] Training on VOC 2012
DCGAN (Radford et al.) [✓] Training on CelebA at 64 × 64
[✓] Sampling
[✓] Interpolating in latent space
[ㅤ] Training on CelebA at 32 × 32
Noroozi et al. [✓] Model architecture
[✓] Chromatic aberration
[✓] Permutation set
Zhang et al. [✓] Visualizing empirical probability distribution
[ㅤ] Model architecture
[ㅤ] Loss function
[ㅤ] Training
2014
2017
Conditional GAN (Mirza et al.)
WGAN-GP (Gulrajani et al.)
[✓] Training on MNIST
2016
2017
VQ-VAE (Oord et al.)
PixelCNN (Oord et al.)
[✓] Training on Fashion MNIST
[✓] Training on CIFAR-10
[✓] Sampling
2017 Pix2Pix (Isola et al.) [✓] Experimenting on image mean and std
[✓] Experimenting on nn.InstanceNorm2d()
[✓] Training on Google Maps
[✓] Training on Facades
[ㅤ] higher resolution input image
CycleGAN (Zhu et al.) [✓] Experimenting on random image pairing
[✓] Experimenting on LSGANs
[✓] Training on monet2photo
[✓] Training on vangogh2photo
[✓] Training on cezanne2photo
[✓] Training on ukiyoe2photo
[✓] Training on horse2zebra
[✓] Training on summer2winter_yosemite
2018 PGGAN (Karras et al.) [✓] Experimenting on image mean and std
[✓] Training on CelebA-HQ at 512 × 512
[✓] Sampling
DeepLabv3 (Chen et al.) [✓] Training on VOC 2012
[✓] Predicting on VOC 2012 validation set
[✓] Average mIoU
[✓] Visualizing model output
RotNet (Gidaris et al.) [✓] Visualizing Attention map
StarGAN (Yunjey Choi et al.) [✓] Model architecture
2020 STEFANN (Roy et al.) [✓] FANnet architecture
[✓] Colornet architecture
[✓] Training FANnet on Google Fonts
[✓] Custom Google Fonts dataset
[✓] Average SSIM
[ㅤ] Training Colornet
DDPM (Ho et al.) [✓] Training on CelebA at 32 × 32
[✓] Training on CelebA at 64 × 64
[✓] Visualizing denoising process
[✓] Sampling using linear interpolation
[✓] Sampling using coarse-to-fine interpolation
DDIM (Song et al.) [✓] Normal sampling
[✓] Sampling using spherical linear interpolation
[✓] Sampling using grid interpolation
[✓] Truncated normal
ViT (Dosovitskiy et al.) [✓] Training on CIFAR-10
[✓] Training on CIFAR-100
[✓] Visualizing Attention map using Attention Roll-out
[✓] Visualizing position embedding similarity
[✓] Interpolating position embedding
[✓] CutOut
[✓] CutMix
[✓] Hide-and-Seek
SimCLR (Chen et al.) [✓] Normalized temperature-scaled cross entropy loss
[✓] Data augmentation
[✓] Pixel intensity histogram
DETR (Carion et al.) [✓] Model architecture
[ㅤ] Bipartite matching & loss
[ㅤ] Batch normalization freezing
[ㅤ] Training on COCO 2017
2021 Improved DDPM (Nichol and Dhariwal) [✓] Cosine diffusion schedule
Classifier-Guidance (Dhariwal and Nichol) [✓] Training on CIFAR-10
[ㅤ] AdaGN
[ㅤ] BiGGAN Upsample/Downsample
[ㅤ] Improved DDPM sampling
[ㅤ] Conditional/Unconditional models
[ㅤ] Super-resolution model
[ㅤ] Interpolation
ILVR (Choi et al.) [✓] Sampling using single reference
[✓] Sampling using various downsampling factors
[✓] Sampling using various conditioning range
SDEdit (Meng et al.) [✓] User input stroke simulation
[✓] Applying CelebA at 64 × 64
- [ ] Total repeats.
  • VE SDEdit.
  • Sampling from scribble.
  • Image editing only on masked regions.| ||MAE (He et al.)|[✓] Model architecture for self-supervised pre-training
    [✓] Model architecture for classification
    [ㅤ] Self-supervised pre-training on ImageNet-1K
    [ㅤ] Fine-tuning on ImageNet-1K
    [ㅤ] Linear probing| ||Copy-Paste (Ghiasi et al.)|[✓] COCO dataset processing
    [✓] Large scale jittering
    [✓] Copy-Paste (within mini-batch)
    [✓] Visualizing data
    [ㅤ] Gaussian filter| ||ViViT (Arnab et al.)|[✓] 'Spatio-temporal attention' architecture
    [✓] 'Factorised encoder' architecture
    [✓] 'Factorised self-attention' architecture| |2022|CFG (Ho et al.)| |Language| |2017|Transformer (Vaswani et al.)|[✓] Model architecture
    [✓] Visualizing position encoding| |2019|BERT (Devlin et al.)|[✓] Model architecture
    [✓] Masked language modeling
    [✓] BookCorpus data processing
    [✓] SQuAD data processing
    [✓] SWAG data processing| ||Sentence-BERT (Reimers et al.)|[✓] Classification loss
    [✓] Regression loss
    [✓] Constrastive loss
    [✓] STSb data processing
    [✓] WikiSection data processing
    [ㅤ] NLI data processing| ||RoBERTa (Liu et al.)|[✓] BookCorpus data processing
    [✓] Masked language modeling
    [ㅤ] BookCorpus data processing ('SEGMENT-PAIR' + NSP)
    [ㅤ] BookCorpus data processing ('SENTENCE-PAIR' + NSP)
    [✓] BookCorpus data processing ('FULL-SENTENCES')
    [ㅤ] BookCorpus data processing ('DOC-SENTENCES')| |2021|Swin Transformer (Liu et al.)|[✓] Patch partition
    [✓] Patch merging
    [✓] Relative position bias
    [✓] Feature map padding
    [✓] Self-attention in non-overlapped windows
    [ㅤ] Shifted Window based Self-Attention| |2024|RoPE (Su et al.)|[✓] Rotary Positional Embedding| |Vision-Language| |2021|CLIP (Radford et al.)|[✓] Training on Flickr8k + Flickr30k
    [✓] Zero-shot classification on ImageNet1k (mini)
    [✓] Linear classification on ImageNet1k (mini)|

Pinned Loading

  1. train_easyocr train_easyocr Public

    Fine-tuning 'EasyOCR' on the '공공행정문서 OCR' dataset provided by 'AI-Hub'.

    Python 12 6

  2. PGGAN PGGAN Public

    PyTorch implementation of 'PGGAN' (Karras et al., 2018) from scratch and training it on CelebA-HQ at 512 × 512

    Python 3

  3. ViT ViT Public

    PyTorch implementation of 'ViT' (Dosovitskiy et al., 2020) and training it on CIFAR-10 and CIFAR-100

    Python 4

  4. CycleGAN CycleGAN Public

    PyTorch implementation of 'CycleGAN' (Zhu et al., 2017) and training it on 6 datasets

    Python 2

  5. DDPM DDPM Public

    PyTorch implementation of 'DDPM' (Ho et al., 2020) and training it on CelebA 64×64

    Python 7

  6. ILVR ILVR Public

    PyTorch implementation of 'ILVR' (Choi et al., 2021) from scratch and applying it to 'DDPM' on CelebA at 64 × 64

    Python 2