[102] TRT-ViT: TensorRT-oriented Vision Transformer #132

dhkim0225 · 2022-10-18T11:06:37Z

TeraFLOPs TeraParams 설명은 건너뛴다.

TRT-ViT

4개의 rule 을 실험적으로 찾아내며, 아키텍처를 고름

이게 끝이다 ㅋㅋ

아래 표에서 볼 수 있듯이 (C) block이 효과적이었다.

detail 한 아키텍처는 다음과 같다

Swin setting (https://github.com/microsoft/Swin-Transformer/blob/d19503d7fbed704792a5e5a3a5ee36f9357d26c1/config.py)
GPU: V100 * 8
epochs: 300
batch-size: 1024
resolution: 224x224
gradient clipping: max norm 1
Augmentation
- RandAugment: rand-m9-mstd0.5-inc1
- mixup 은 0.5 확률로 둘 중 하나 선택
  - Mixup: alpha 0.8
  - Cutmix: alpha 1.0
- random erasing: 0.25
- stochastic depth: 0.1 (DeiT 스럽게 약간 변형)
- repeated augmentation, EMA 2개는 사용 안했음 (Swin 기준 성능에 별 영향 없었음)
optimizer
- AdamW
- weight deacy: 0.05
- warmup: 30 epoch
- lr
  - 0.001
  - cosine decay

The text was updated successfully, but these errors were encountered:

dhkim0225 added the Efficient label Oct 18, 2022

dhkim0225 changed the title ~~TRT-ViT: TensorRT-oriented Vision Transformer~~ [102] TRT-ViT: TensorRT-oriented Vision Transformer Oct 18, 2022

dhkim0225 added BackBone ByteDance labels Oct 18, 2022