-
Notifications
You must be signed in to change notification settings - Fork 5
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Training the transformer #1
Comments
Yes, I promise I will get to doing this (let's hope before Christmas!) You can currently try to do word-level extractive summarization by formatting it as a sequence to sequence task - but my experience is that none of these pre-trained language models deduce that the output sequence must keep the original words outputted in the original order or else it stops being "extractive" in the sense that myself and competitive debaters are looking for (like an actual highlighter). I've tried to put tags around the labels - but the models are still too stupid to figure it out. Honestly would love some help from anyone in the community who might have insights on how to fix this issue. As a side note. I do warn anyone trying to play the benchmark chasing game that the original evaluation done in the paper is hilariously bad because the default settings in pyrouge only look at the first 100 tokens of the summary. This seems to be done for performance reasons as when I realized this (after the paper was published), running it with no limit has pretty much always eventually crashed. I think that ROUGE is now built in as an evaluator in many frameworks (such as huggingface) which should solve this problem now. As such, I want it on the record that future authors should discount my reported benchmark numbers and link to my github comment here to indicate why this is. I would prefer that people not try to play the benchmark chasing game (at least with ROUGE) because summarization is an inherently subjective task. The space of potential "good" summarizations explodes when you do it at the word level on longish documents - of which this dataset ultimately is. Proper evaluation would almost certainly report significantly different results than what's found in the paper. Future authors should instead reevaluate the models I reported scores for properly, and note the error made in this paper. |
And I will also make an effort to get trained models/weights posted and hosted on huggingface - but this may take some time. |
Hi @Hellisotherpeople ,
as you wrote in your paper, you trained several transformer models, including BERT-large and Longformer-base. You also mentioned the usage of simple-transformer library. Could you share a short code snippet on how you trained the model for extractive summarization, please?
Thanks in advance!
The text was updated successfully, but these errors were encountered: