Relative positions in RoPE embeddings #46

AshwinRamachandran2002 · 2023-09-24T19:20:01Z

Hi,
I was going through your code to know how you calculated the RoPE embeddings and need a clarification

In assigning a relative position to a newly generated token, the base reference is taken as the end of the prompt input
https://github.com/abertsch72/unlimiformer/blob/232fc235706c304667f7a671cca2203d4625eaa1/src/unlimiformer.py#L1084C10-L1084C10

In assigning a relative position to the retrieved key indices the relative position is taken as the start of the prompt input

unlimiformer/src/unlimiformer.py

Line 1123 in 232fc23

    
           scaled_key_indices = ((top_search_key_indices / self.prompt_input_ids.shape[1]) * self.actual_model_window_size).int()

Then would it not be the case that the current hidden state gives more attention to the tokens somewhere in the middle of the prompt and then decays both to the right and left?

Thank you
Ashwin Ramachandran

urialon · 2023-09-25T13:27:21Z

Hi @AshwinRamachandran2002 ,
Thank you for your interest in our work!

Your reading is correct, and you are looking at the right places in the code:

In assigning a position for the query, the position we give it is "the number of generated tokens so far".
In assigning a position for a retrieved key, the position we give it is "its relative position in the initial long prompt".

These are the settings that we found to work the best in our initial experiments.
I agree that it may not be optimal.
But I can't say whether "the current hidden state gives more attention to the tokens somewhere in the middle of the prompt and then decays both to the right and left" - it's an hypothesis that is worth checking, and possibly fixing (and writing a paper about if you manage to do that :-) )

Please let us know if you have any questions!
Uri

AshwinRamachandran2002 · 2023-09-25T19:10:33Z

Thank you for your reply
I would like to also know how decided upon the vectorstore query

unlimiformer/src/unlimiformer.py

Line 1098 in 232fc23

    
           datastore_query = torch.matmul(datastore_query, k_proj + k_proj_rotated) # (batch * beam, num_heads, 1, embed_dim)

You have used an approximation to the R(m) * W_k as W_k + Rotated(W_k)

Did you also consider dropping R(m) ?

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Relative positions in RoPE embeddings #46

Relative positions in RoPE embeddings #46

AshwinRamachandran2002 commented Sep 24, 2023

urialon commented Sep 25, 2023 •

edited

Loading

AshwinRamachandran2002 commented Sep 25, 2023

Relative positions in RoPE embeddings #46

Relative positions in RoPE embeddings #46

Comments

AshwinRamachandran2002 commented Sep 24, 2023

urialon commented Sep 25, 2023 • edited Loading

AshwinRamachandran2002 commented Sep 25, 2023

urialon commented Sep 25, 2023 •

edited

Loading