You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Then would it not be the case that the current hidden state gives more attention to the tokens somewhere in the middle of the prompt and then decays both to the right and left?
Thank you
Ashwin Ramachandran
The text was updated successfully, but these errors were encountered:
Your reading is correct, and you are looking at the right places in the code:
In assigning a position for the query, the position we give it is "the number of generated tokens so far".
In assigning a position for a retrieved key, the position we give it is "its relative position in the initial long prompt".
These are the settings that we found to work the best in our initial experiments.
I agree that it may not be optimal.
But I can't say whether "the current hidden state gives more attention to the tokens somewhere in the middle of the prompt and then decays both to the right and left" - it's an hypothesis that is worth checking, and possibly fixing (and writing a paper about if you manage to do that :-) )
Hi,
I was going through your code to know how you calculated the RoPE embeddings and need a clarification
In assigning a relative position to a newly generated token, the base reference is taken as the end of the prompt input
https://github.com/abertsch72/unlimiformer/blob/232fc235706c304667f7a671cca2203d4625eaa1/src/unlimiformer.py#L1084C10-L1084C10
In assigning a relative position to the retrieved key indices the relative position is taken as the start of the prompt input
unlimiformer/src/unlimiformer.py
Line 1123 in 232fc23
Then would it not be the case that the current hidden state gives more attention to the tokens somewhere in the middle of the prompt and then decays both to the right and left?
Thank you
Ashwin Ramachandran
The text was updated successfully, but these errors were encountered: