inv_freq or freq in RoPE implementation? (Ch05-07 GPT to Llama) #477
-
In both
Given that this initializes the vector of Perhaps what is confusing is that Hugging Face's implementation also calls these inverse frequencies, while Llama's implementation calls these frequencies. Please let me know if I'm missing something! Thank you! |
Beta Was this translation helpful? Give feedback.
Replies: 1 comment 4 replies
-
Hi, As a fellow user/reader, I had the same question some time ago, see #412. The angles = positions[:, None] * inv_freq[None, :] Llama uses the same approach with |
Beta Was this translation helpful? Give feedback.
Thanks for the great discussion here. I think the reason why "inverse" frequency is a popular term is simply because of the fact that it decreases inversely with the index.
Sure, it's also a "frequency" as a general term, but the "inverse" kind of describes the relationship with i.