-
Notifications
You must be signed in to change notification settings - Fork 12
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
records: extend importer
module to allow bulk import from Rivet
#811
Comments
Thanks for this Graeme. I put the list of inspire IDs here and you'll find the 780 tarballs in the same directory. They all have a name of the form |
@20DM : thanks, that's great! I'll look into modifying the I picked a random submission (
|
Thanks for the feedback, Graeme! Re 1: Ah good point, yes it should be Re 2: Ouff, yeah that doesn't look great. Apologies for that! I didn't realise there were occasionally two abstracts - it looks like most of the Inspire IDs I've got on my list only have one in fact. I've tweaked the logic now to take the arXiv one if it's available and fall back to the Inspire one otherwise. I was already falling back to the description from the Rivet info file in the few cases (~5) where no abstract is available from Inspire. I think we definitely want to add some kind of caveat sentence to highlight that the values are digitised from the paper (or come from Rivet or whatever - happy to tweak the wording!) in order to make it clear that they weren't provided directly by the experiment. However, I wouldn't want that single sentence to suppress the abstract, which I find useful to have personally, so perhaps the duplication of the abstract in the comment is acceptable? In any case, I've replaced the tarballs with new versions using the arXiv abstract where available. Re 3: So I actually started doing this at first, but then quickly realised that it would require rewriting several hundreds of the routines: Many of them currently "abuse" the x- and y-axis integers in the identifier to group distributions - but not necessarily in the intended way. For instance, there are cases where one would need to turn existing |
Hi Graeme, just to ping this - is there anything I can help with? |
Thanks for making the changes to the tarballs. I haven't started looking at this yet, since I didn't see that it was particularly urgent, but I'll try to look into it within the next couple of months. |
The
importer
module (CLI) was written to import records from hepdata.net to a developer's local instance. It uses a list of INSPIRE IDs given athttps://www.hepdata.net/search/ids?inspire_ids=true
and it downloads files using a URL patternurl = "{0}/download/submission/ins{1}/original".format(base_url, inspire_id)
wherebase_url = 'https://hepdata.net'
.The
importer
module should be extended to get the list of INSPIRE IDs and the download files from an alternate location, for example, a simple web directory with the INSPIRE IDs contained in the name of the files. It should also be possible to create records with any user assigned as the Coordinator (rather than justadmin_user_id = 1
). The ability to import only a subset of the complete list of INSPIRE IDs would be useful.These changes should be carefully tested locally and on the QA system before importing to the production instance. Such an extension would be a quicker way of importing the 780 records obtained from Rivet than using the normal submission web interface.
See also discussion with @20DM in HEPData/hepdata_lib#229.
A list of the Rivet analyses can be seen at https://gitlab.com/hepcedar/rivet/-/issues/485 .
The text was updated successfully, but these errors were encountered: