-
Notifications
You must be signed in to change notification settings - Fork 81
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Steps to run the code #33
Comments
Hi @sahulsumra, thanks for your interest in our work! Have you tried following the instructions for running in the readme? |
Hi @abertsch72, my first problem was getting the conda environment setup on the basis of "requirements.txt". Not sure if you are working within a conda environment? But doing so might help to isolate exactly what needs installing. So, aside from some packages that were absent from your "requirements.txt" file, I managed to get the inference_example.py working fine. Working with a decent-sized gpu on the cluster, so it's fast and very happy with that. (Thumbs up.) But I had a bunch of problems with "src/run.py", when I try to run:
I get:
I found that this verification_mode variable is new to hf. (The solution, for now, seems to be to go to the run.py file and comment "verification_mode" out and replace it with the soon-to-deprecated "ignore_verifications=True") Hoping the above is useful to some. |
It's strange because I would expect the following lines from src/run.py
to pick up the column names from the ccdv/govreport-summarization dataset itself. Why doesn't it? |
Anyway, I got it working by rewriting your deduplicate function and instead having it assign an "id" column. Most first-time users would be using this with a standard dataset such as ccdv/govreport-summarization. So no need for deduping. I also needed to change the column_names assignment within the run.py file. Seems like a bit of work is needed to make this more streamlined and accessible. |
Ah okay, I now understand. In your original gov_report.json file: "dataset_name": "tau/sled", Selects the gov_report dataset within "tau/sled". Okay, that now makes sense. I will leave the above trail for others that go down the same rabbit hole. (I am still confused about what exactly an epoch is here. Why I don't see 17.5k when I run ccdv/govreport-summarization with "num_train_epochs": 1? Instead, I see 1000/8759, 2000/8759, ...) Finally, at the end of training, got the following error:
Any suggestions for fixing this last one? Thanks in advance! |
Can you explain me how to run this code?
The text was updated successfully, but these errors were encountered: