Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[102] TRT-ViT: TensorRT-oriented Vision Transformer #132

Open
dhkim0225 opened this issue Oct 18, 2022 · 0 comments
Open

[102] TRT-ViT: TensorRT-oriented Vision Transformer #132

dhkim0225 opened this issue Oct 18, 2022 · 0 comments

Comments

@dhkim0225
Copy link
Owner

paper

TeraFLOPs TeraParams 설명은 건너뛴다.

TRT-ViT

4개의 rule 을 실험적으로 찾아내며, 아키텍처를 고름

  1. transformer block 은 마지막 stage 에 위치하는 게 가성비가 좋다 (널리 알려진 사실)
  2. 앞쪽 stage 는 얕아도 된다.
    image
  3. transformer block 보다는, transformer + bottleneck 을 혼합시킨게 더 가성비가 좋다
  4. global 을 먼저 보고 local 을 보는게 더 효과적이더라

image

이게 끝이다 ㅋㅋ

아래 표에서 볼 수 있듯이 (C) block이 효과적이었다.
image

detail 한 아키텍처는 다음과 같다
image

Results

ImageNet

image

Setttings

  • Swin setting (https://github.com/microsoft/Swin-Transformer/blob/d19503d7fbed704792a5e5a3a5ee36f9357d26c1/config.py)
  • GPU: V100 * 8
  • epochs: 300
  • batch-size: 1024
  • resolution: 224x224
  • gradient clipping: max norm 1
  • Augmentation
    • RandAugment: rand-m9-mstd0.5-inc1
    • mixup 은 0.5 확률로 둘 중 하나 선택
      • Mixup: alpha 0.8
      • Cutmix: alpha 1.0
    • random erasing: 0.25
    • stochastic depth: 0.1 (DeiT 스럽게 약간 변형)
    • repeated augmentation, EMA 2개는 사용 안했음 (Swin 기준 성능에 별 영향 없었음)
  • optimizer
    • AdamW
    • weight deacy: 0.05
    • warmup: 30 epoch
    • lr
      • 0.001
      • cosine decay

Ablations

image

ADE 20K

image

COCO

image

@dhkim0225 dhkim0225 changed the title TRT-ViT: TensorRT-oriented Vision Transformer [102] TRT-ViT: TensorRT-oriented Vision Transformer Oct 18, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

1 participant