The performance of adopting the Video Swin Transformer as offline-extracted video encoder? #24

3DMM-ICME2022 · 2022-07-06T05:40:45Z

Hi, Thanks for your nice work! @kevinlin311tw

However, could you please report the performance of adopting the Video Swin Transformer as an offline-extracted video encoder?

In Table 2, other methods adopt C3D or I3D while yours use Video Swin Transformer, it is not fair comparison, right?

JoseponLee · 2022-07-25T20:53:38Z

Same question about the performance of adopting the Video Swin Transformer as an offline-extracted video encoder, Maybe the author can provide 3D features after Video Swin Transformer feature extraction

kevinlin311tw · 2022-09-27T21:23:10Z

In the appendix, we show a comparison where both approaches use the same SlowFast as the backbone for feature extraction. Our method achieves better performance than VALUE.

We also compare with the full version of VALUE, which uses both CLIP-ViT and SlowFast as backbones. Although our video backbone uses less pre-training data than VALUE, we achieve better caption performance.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

The performance of adopting the Video Swin Transformer as offline-extracted video encoder? #24

The performance of adopting the Video Swin Transformer as offline-extracted video encoder? #24

3DMM-ICME2022 commented Jul 6, 2022 •

edited

Loading

JoseponLee commented Jul 25, 2022

kevinlin311tw commented Sep 27, 2022 •

edited

Loading

The performance of adopting the Video Swin Transformer as offline-extracted video encoder? #24

The performance of adopting the Video Swin Transformer as offline-extracted video encoder? #24

Comments

3DMM-ICME2022 commented Jul 6, 2022 • edited Loading

JoseponLee commented Jul 25, 2022

kevinlin311tw commented Sep 27, 2022 • edited Loading

3DMM-ICME2022 commented Jul 6, 2022 •

edited

Loading

kevinlin311tw commented Sep 27, 2022 •

edited

Loading