-
Notifications
You must be signed in to change notification settings - Fork 86
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Question: What are some recommended better alternatives for RBP ? #105
Comments
Thank you; I'm very glad you found it useful. With regard to RPB, yes, there actually are very good alternatives that bias the inputs to the attention operator instead of attention weights, and they not only provide similar or better accuracy than RPB, they're easier to train, and are (usually) cheaper. This is actually what made us not bother with RPB / attention bias, because it usually defeats the purpose of kernel fusion, and further bottlenecks an already complicated backwards kernel. We're going to push out a new preprint in the coming weeks that directly addresses this, and of course everything will be open sourced at that time. |
Are you saying there are existing techniques that are better (in which case could you name them explicitly so we could use them?) or that you have invented a new one (which you'd understandably like to publish at the same time as your preprint?) Also, do these techniques support inference on unseen sequence lengths (like ConvNeXT)? Thanks! |
Yes, rotary embeddings, if tuned correctly, often outperform RPB, and they are easier to implement and performance optimize in a lot of ways. |
@alihassanijr Would you happen to have a link to the preprint? I'm also curious to learn more about alternatives to RPB for neighborhood attention. Thanks! |
I moved this issue here since this issue is more related to NAT/DiNAT as opposed to NATTEN. We'll be updating this thread soon. |
Hi! Thanks again for your great work! Are there any updates on the use of rotary positional encodings? or any code snippet you could share? Thanks in advance! |
First of all, thank you for providing this library.
I want to move a 2D Swin image->image model to neighbourhood attention. So for, I have been using the relative positional embeddings as in the original Swin repo.
Both in issues as well as the documentation of the fused attention, you mention that there will most likely never be an implementation of RBP in the fused kernels, and that there are better alternatives.
... Could you maybe give me some pointers to techniques that work in you experience well with neighborhood attention?
Cheers
Felix
The text was updated successfully, but these errors were encountered: