-
Notifications
You must be signed in to change notification settings - Fork 35
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Ask detailed questions about the permutation strategy in the RAR #48
Comments
I think it is because that the prefix does not belong to the fine-grained image content. |
Thanks, as @pansanity666 said, prefix refers to class token and condition token, while we only permute the image tokens (referred as postfix in the code). |
Thanks for your great work! @cornettoyu I’m wondering about the difference between Thanks! |
Hi, I see the terms could be abused and misleading. To clarify, here condition_token indicates the external condition input for guiding generation (e.g., ImageNet class), while cls_token refers to the learnable placehold token, as is used by the original ViT paper. Since we try to minimize the architecture change, we keep the cls_token as in the original ViT, but it should be fine removing it. |
Hello author, I am very grateful for your excellent work and generous open source. After reading your source code, I have a small question, why the permute strategy does not shuffle the initial condition? Anyway, you will return to the original permutation after annealing. Here are some of the source codes that I am confused about:
Is it because of the contribution of target-aware positional embedding, it doesn't matter if it is not shuffled?
The text was updated successfully, but these errors were encountered: