Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Relative positions in RoPE embeddings #46

Open
AshwinRamachandran2002 opened this issue Sep 24, 2023 · 2 comments
Open

Relative positions in RoPE embeddings #46

AshwinRamachandran2002 opened this issue Sep 24, 2023 · 2 comments

Comments

@AshwinRamachandran2002
Copy link

Hi,
I was going through your code to know how you calculated the RoPE embeddings and need a clarification

In assigning a relative position to a newly generated token, the base reference is taken as the end of the prompt input
https://github.com/abertsch72/unlimiformer/blob/232fc235706c304667f7a671cca2203d4625eaa1/src/unlimiformer.py#L1084C10-L1084C10

In assigning a relative position to the retrieved key indices the relative position is taken as the start of the prompt input

scaled_key_indices = ((top_search_key_indices / self.prompt_input_ids.shape[1]) * self.actual_model_window_size).int()

Then would it not be the case that the current hidden state gives more attention to the tokens somewhere in the middle of the prompt and then decays both to the right and left?

Thank you
Ashwin Ramachandran

@urialon
Copy link
Collaborator

urialon commented Sep 25, 2023

Hi @AshwinRamachandran2002 ,
Thank you for your interest in our work!

Your reading is correct, and you are looking at the right places in the code:

  1. In assigning a position for the query, the position we give it is "the number of generated tokens so far".
  2. In assigning a position for a retrieved key, the position we give it is "its relative position in the initial long prompt".

These are the settings that we found to work the best in our initial experiments.
I agree that it may not be optimal.
But I can't say whether "the current hidden state gives more attention to the tokens somewhere in the middle of the prompt and then decays both to the right and left" - it's an hypothesis that is worth checking, and possibly fixing (and writing a paper about if you manage to do that :-) )

Please let us know if you have any questions!
Uri

@AshwinRamachandran2002
Copy link
Author

Thank you for your reply
I would like to also know how decided upon the vectorstore query

datastore_query = torch.matmul(datastore_query, k_proj + k_proj_rotated) # (batch * beam, num_heads, 1, embed_dim)

You have used an approximation to the R(m) * W_k as W_k + Rotated(W_k)

Did you also consider dropping R(m) ?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants