-
Notifications
You must be signed in to change notification settings - Fork 358
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Use MMR as the output score and sorting key #227
base: master
Are you sure you want to change the base?
Conversation
Thanks for the PR. Could you perhaps first go into the details why you think that the implementation as it already stands is not correct? The examples that you show demonstrate what you would expect but not why you would expect it. Moreover, the diversity value you use is rather low and in the example you reference a diversity of 0.7 is also used to showcase more diversity. |
@MaartenGr Thanks for your feedback. Sure, I can explain in more detail. I would expect the MRR score returned when I use it to evaluate the effect of the threshold on the MRR to optimize a target metric for an information retrieval task. This is an alternative to establish a threshold based on the number of results (top_n). I don't think that the implementation as it already stands is not correct, I would rather think probably depends on the specific use case, and the MRR metric is specially useful when using the MMR where the algorithm decide which keyphrases to include in the set of results based on its value (considering we extract a limited number of keyphrases). I agree using a diversity of 0.7 I can get more diversity, I just wanted to show an example of the relative difference considering the relevance vs the relevance and diversity metrics. |
Are you referring here to the Mean Reciprocal Rank? If so, why would think this improves upon the current implementation? You mention that it is an alternative but I do not see how this would improve upon what is already implemented.
Considering there are use cases for both, with and without MRR, I am not sure we should change what is already implemented. Instead, I might opt for adding something like this for the user to choose instead. |
I'm sorry, it was a typo there, not MRR but MMR (Maximal Marginal Relevance).
|
When consider using Maximal Marginal Relevance (MMR) to diversify the results it turns out the keyword or keyphrases order are based on the cosine similarity scores instead of the MMR. Although the same keywords are returned, I would expect to see more diversity in the top results.
Current output
Expected output (here the scores are the MRR)