Ask detailed questions about the permutation strategy in the RAR #48

eanson023 · 2024-11-06T09:04:27Z

Hello author, I am very grateful for your excellent work and generous open source. After reading your source code, I have a small question, why the permute strategy does not shuffle the initial condition? Anyway, you will return to the original permutation after annealing. Here are some of the source codes that I am confused about:

 # cls_token, condition, the permute does not impact these prefix tokens.
prefix = 2
pos_embed_prefix = pos_embed[:, :prefix]
pos_embed_postfix = self.shuffle(pos_embed[:, prefix:prefix+self.image_seq_len], orders)

Is it because of the contribution of target-aware positional embedding, it doesn't matter if it is not shuffled?

pansanity666 · 2024-11-06T11:15:31Z

I think it is because that the prefix does not belong to the fine-grained image content.

cornettoyu · 2024-11-07T21:42:25Z

Thanks, as @pansanity666 said, prefix refers to class token and condition token, while we only permute the image tokens (referred as postfix in the code).

Jiawei-Yang · 2024-11-18T23:07:15Z

Thanks for your great work! @cornettoyu I’m wondering about the difference between cls_token and condition_token. Shouldn't the condition_token just be some class_id tokens with a randomly augmented none_class token? Is there any particular reason to use a separate cls_token?

Thanks!

cornettoyu · 2024-11-18T23:57:43Z

Thanks for your great work! @cornettoyu I’m wondering about the difference between cls_token and condition_token. Shouldn't the condition_token just be some class_id tokens with a randomly augmented none_class token? Is there any particular reason to use a separate cls_token?

Thanks!

Hi,

I see the terms could be abused and misleading. To clarify, here condition_token indicates the external condition input for guiding generation (e.g., ImageNet class), while cls_token refers to the learnable placehold token, as is used by the original ViT paper. Since we try to minimize the architecture change, we keep the cls_token as in the original ViT, but it should be fine removing it.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Ask detailed questions about the permutation strategy in the RAR #48

Ask detailed questions about the permutation strategy in the RAR #48

eanson023 commented Nov 6, 2024

pansanity666 commented Nov 6, 2024 •

edited

Loading

cornettoyu commented Nov 7, 2024 •

edited

Loading

Jiawei-Yang commented Nov 18, 2024

cornettoyu commented Nov 18, 2024

Ask detailed questions about the permutation strategy in the RAR #48

Ask detailed questions about the permutation strategy in the RAR #48

Comments

eanson023 commented Nov 6, 2024

pansanity666 commented Nov 6, 2024 • edited Loading

cornettoyu commented Nov 7, 2024 • edited Loading

Jiawei-Yang commented Nov 18, 2024

cornettoyu commented Nov 18, 2024

pansanity666 commented Nov 6, 2024 •

edited

Loading

cornettoyu commented Nov 7, 2024 •

edited

Loading