BookSum_Full BART Baseline script/code #66

saxenarohit · 2024-07-12T15:59:26Z

Hi,

Great work! Thanks for sharing the code.

I have been trying to replicate the simple BART baseline on BookSum_full and I am unable to reproduce the results.

Can you share the code/script you used to train this model
https://huggingface.co/abertsch/bart-base-booksum

I was able to replicate the BART baseline for all the other datasets except this one.

Thanks!

saxenarohit · 2024-07-15T10:11:35Z

Hi @empanada11 ,

I meant that I was able to reproduce the BART baseline (not Unlimiformer) on the gov_report dataset. It seems to me that the BART baseline (not Unlimiformer) on BookSum is first fine-tuned on BookSum.

I saw from other issues (#57) that people were not able to replicate the Unlimiformer results on gov_report. Can you ask for an update on that issue? Let me try running that as well. Also, the datasets from tau/sled don't have a test set. Does that mean the paper reports the development score?

abertsch72 · 2024-07-15T18:30:33Z

Hey @saxenarohit ! That's strange, thanks for flagging. What do you get when you train BART-base? And what library are you using to evaluate ROUGE?

the datasets from tau/sled don't have a test set. Does that mean the paper reports the development score?

We report the test set scores using test sets from the original datasets, preprocessed to match the SCROLLS dataset formatting. (E.g. for govreport). We do this instead of submitting to the leaderboard because we didn't run all the SCROLLS tasks. There's also development set scores in the appendices, though, if you'd like to work off of those!

@empanada11 sorry to hear that! Can you share what your issue is?

saxenarohit · 2024-07-16T10:28:48Z

Hi @abertsch72, thanks for your response.
I am using transformer evaluate and getting 'Rouge1': 24.42, 'Rouge2': 5.75, 'RougeL': 12.98, 'RougeLsum': 23.00 on testset. These numbers are quite off.
Can you please share the script/code/hyperparameters to replicate the results?

abertsch72 · 2024-08-02T21:34:05Z

Hi @saxenarohit -- sorry for the delay. That definitely sounds quite low-- is it possible you're generating less than 1024 token outputs? I've been traveling but I will dig up the booksum code this weekend!

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

BookSum_Full BART Baseline script/code #66

BookSum_Full BART Baseline script/code #66

saxenarohit commented Jul 12, 2024 •

edited

Loading

saxenarohit commented Jul 15, 2024 •

edited

Loading

abertsch72 commented Jul 15, 2024

saxenarohit commented Jul 16, 2024 •

edited

Loading

abertsch72 commented Aug 2, 2024

BookSum_Full BART Baseline script/code #66

BookSum_Full BART Baseline script/code #66

Comments

saxenarohit commented Jul 12, 2024 • edited Loading

saxenarohit commented Jul 15, 2024 • edited Loading

abertsch72 commented Jul 15, 2024

saxenarohit commented Jul 16, 2024 • edited Loading

abertsch72 commented Aug 2, 2024

saxenarohit commented Jul 12, 2024 •

edited

Loading

saxenarohit commented Jul 15, 2024 •

edited

Loading

saxenarohit commented Jul 16, 2024 •

edited

Loading