BEIT-3-Large - Layer fusion #38

MarcoForte · 2024-11-15T11:04:33Z

Hi thanks for your great work, exploring BEIT as an alternative to CLIP.

I find it very well motivated in the paper, but I struggle to reproduce the BEIT3 results in my independent training codebase.
So far I can match / surpass clip results, and the addition of CLIP_Image in Late Concat is beneficial.

However, so far BEIT3 underperforms clip. So I'm wondering if I am missing something.

For your BEIT experiments, what do you mean by Late Concat and Early(L1-L12), Early(L1-L24)? I can't find reference to this in the code, and neither in the beit repo or torchscale repo. If you could share a code sample you would really help to articulate your point

Thank you for your time

CoderZhangYx · 2024-11-15T11:52:22Z

Hi, thank you for reproducing our work!
Our BEIT experiment is to prove the effectiveness of "early-fusion", where "late" means use beit3 to extract separate single-modal feat and concat them. "Early(L1-L12)" means we only enable cross-modal attention in layer1~layer12 of beit3. "Early(L1-L24)" means we enable cross-modal attention in all layers of beit3, which is the original beit3. We implement by manually adding a masked attention map in beit3 source code.
If you can provide part of your training codebase (both by issue or by email are ok), I can help you look for problems and fix bugs. But due to company policy, I cannot directly upload the training codebase.

MarcoForte · 2024-11-15T17:08:10Z

@CoderZhangYx Thank you for the swift reply!
You've cleared up a good deal of confusion for me.
Not sure I'll be able to share code, but great to have that option.

For you experiments with CLIP did you also unfreeze the model?

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

BEIT-3-Large - Layer fusion #38

BEIT-3-Large - Layer fusion #38

MarcoForte commented Nov 15, 2024 •

edited

Loading

CoderZhangYx commented Nov 15, 2024

MarcoForte commented Nov 15, 2024

BEIT-3-Large - Layer fusion #38

BEIT-3-Large - Layer fusion #38

Comments

MarcoForte commented Nov 15, 2024 • edited Loading

CoderZhangYx commented Nov 15, 2024

MarcoForte commented Nov 15, 2024

MarcoForte commented Nov 15, 2024 •

edited

Loading