-
Notifications
You must be signed in to change notification settings - Fork 81
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Use of other Encode/Decoder Models #55
Comments
Hi @rdmerillat , We haven't tried pegasus, but the solution you described sounds correct. Make sure to add the new class to the Please let us know how it goes! Best, |
Hi @urialon, |
Hi @patrickocal , I'd say that both of them project queries, as described in the paper, with some unrelated difference at Llama due to its Rotary Position Embeddings. Which difference between them are you concerned about? Best, |
Will do! |
Hi @urialon, thanks for the quick reply and thanks for sharing all your good work. By way of background, I am taking Jure Leskovec's course on Machine Learning with Graphs and my team's project is: how to integrate Knowledge Graphs with your approach (any suggestions would be welcome). Regarding:
I guess I was referring to your python terminology as per methods below. Formally, by projected query, I mean the first term in the inner product My understanding is that the step where the query projected in UnlimiformerLLaMa is in the method:
As far as I can see, the k_proj is only applied to the key in UnlimiformerBART in the following method:
Thus, my understanding of UnlimiformerBART is that the Thanks! |
Hi @patrickocal , The attention reformulation trick that we highlight in the paper is used mostly for inference time. At training time, we do compute the "standard" attention where both the key and the query are projected. The function At test time, we indeed project the query, and keep the key without projection in the datastore. Does that help? Best, |
Thanks again for the quick reply, @urialon. I'm afraid do have more questions :) So thanks in advance for your patience. My understanding is that, within the reset_memory method of the Unlimiformer class (where I have deleted unrelated lines):
we see that states are added whenever
My understanding is therefore, that it is only a FAISS index of the state, So, please correct me if I'm wrong, but, in all cases, when we conduct a On the other hand, I can see that it is keys and queries that get passed into the I may be out of action on Friday, but I look forward to your response. Thanks Uri! Patrick |
This is correct:
This is correct - that's the main insight of the Attention Reformulation section in the paper.
Whenever keys and queries are passed into the
In the I hope it helps, feel free to ask any questions! |
Hello, I've been using Unlimiformer as a comparison with current standard methods of summarization and was wondering if there was anything in particular that would be needed in order to convert say a Pegasus model into Unlimiformer as it should work with "All Encoder/Decoder" models. I see several lines commented out in
unlimiformer.py
(here) for AutoModelForSeq2Seq, however I currently dont see a direct way this has been implemented yet.As Pegasus is BART based, I set up a new model conveter
PegasusForConditionalGeneration: UnlimiformerPegasus,
and started a new unlimiformer class for it:However, I was wondering if you or anyone else had found additional tweeking that was needed to fully convert say a pegasus model.
And I guess more generally, what is the procedure that you use when setting up your own new unlimiformer converted models as I was unable to simply glean what was necessary to assure "consistent" performance and or results.
Thanks!
The text was updated successfully, but these errors were encountered: