Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

download #8

Open
1 of 2 tasks
eamonnmag opened this issue Jun 7, 2016 · 13 comments
Open
1 of 2 tasks

download #8

eamonnmag opened this issue Jun 7, 2016 · 13 comments
Assignees

Comments

@eamonnmag
Copy link
Contributor

eamonnmag commented Jun 7, 2016

@ntrrgc
Copy link
Contributor

ntrrgc commented Jun 7, 2016

It would have been nice if you had mentioned that before I implemented the feature... Chances are I won't have enough time before June 15th to redo it with another format whilst I'm still working on the thesis, the manuals and a part-time job.

Adding the DOI will require changes in the indexer. Do all records have a DOI?

What about normalization? During inexing I run a series of cleaning actions on the errors so that...

  • Unlabeled errors get the main label.
  • Errors written in % format get transformed to absolute value
  • Infinity is converted to a very high (yet finite) number because JSON does not support Infinity (same in negative for -Infinity).
  • Non standard scientific notation (e.g. 2.1 exp -10 instead of 2.1e-10) is parsed and converted to standard float.
  • LaTeX encoded numbers with ± errors (or are they ranges?) like 2.63e10 $\pm$ 2.5e9 are turned into a number value (2.63e10) and an error is added to the errors list ({symerror: 2.5e9, label: _pm}).

Ranges built with low and high are normalized as errors in the client side to allow plotting all kind of variables in the x and y axis, but the normalization is stored in a separate field, so it should be able to recognize those when exporting.

@eamonnmag
Copy link
Contributor Author

It was discussed numerous times on slack, and also in the call we had a few months ago. I should have created this issue before however. If we don't know where each data point comes from, things will be a mess when it comes to citing which data records you use.

All records have a DOI. I think the normalization you do makes sense.

@ntrrgc
Copy link
Contributor

ntrrgc commented Jun 8, 2016

It was discussed numerous times on slack

Sorry, but I couldn't find any by searching 'download' in the Slack history, going from Oct 16th.

If we don't know where each data point comes from, things will be a mess

I have added a table_ref index in d6a8d8e.

Adding the DOI should be easy, but will require a reindex. Which value of DOI should I use?

ins373000/publication.json
46:    "doi": "10.1007/BF01411011", 
51:    "hepdata_doi": "10.17182/hepdata.48499", 

@eamonnmag
Copy link
Contributor Author

HEPData DOI. Each table has it's own DOI too.

On 8 June 2016 at 12:25, Juan Luis Boya García [email protected]
wrote:

It was discussed numerous times on slack

Sorry, but I couldn't find any by searching 'download' in the Slack
history, going from Oct 16th.

If we don't know where each data point comes from, things will be a mess

I have added a table_ref index in d6a8d8e
d6a8d8e.

Adding the DOI should be easy, but will require a reindex. Which value of
DOI should I use?

ins373000/publication.json
46: "doi": "10.1007/BF01411011",
51: "hepdata_doi": "10.17182/hepdata.48499",


You are receiving this because you authored the thread.
Reply to this email directly, view it on GitHub
#8 (comment),
or mute the thread
https://github.com/notifications/unsubscribe/AARPHEAyjGGZfkj0pJoNjnHdiPr9xg2Jks5qJpiMgaJpZM4IvwGN
.


Eamonn Maguire

@ntrrgc
Copy link
Contributor

ntrrgc commented Jun 8, 2016

I can't find per table DOI anywhere in submission.yaml, Table*.yaml nor publication.json (I store there the results from the API endpoint you enabled some time ago). Are they generated with some pattern (e.g. hepdata_doi + '/' + table_num) maybe?

@eamonnmag
Copy link
Contributor Author

eamonnmag commented Jun 8, 2016

Yeah, they are indeed

e.g. 10.17182/hepdata.73395.v1/t1 from http://dx.doi.org/10.17182/hepdata.73395

@ntrrgc
Copy link
Contributor

ntrrgc commented Jun 8, 2016

What is the pattern?

@ntrrgc
Copy link
Contributor

ntrrgc commented Jun 8, 2016

Nevermind, I didn't saw the example in the email notification.

@ntrrgc
Copy link
Contributor

ntrrgc commented Jun 8, 2016

Added in 44d15d5, both for tables and publications.

@eamonnmag
Copy link
Contributor Author

Nice! Thanks Juan.

On 8 June 2016 at 22:54, Juan Luis Boya García [email protected]
wrote:

Added in 44d15d5
44d15d5,
both for tables and publications.


You are receiving this because you authored the thread.
Reply to this email directly, view it on GitHub
#8 (comment),
or mute the thread
https://github.com/notifications/unsubscribe/AARPHOANY4RA58hpy1efl3F2lmCZ66csks5qJyv4gaJpZM4IvwGN
.


Eamonn Maguire

@eamonnmag
Copy link
Contributor Author

eamonnmag commented Jun 9, 2016

So, I've looked at the export. It's almost there.

But the table ref you're using is not unique in the export. It would be better to use the DOI referring to each table as the unique identifier.

e.g.

- x: 0.1
        x_low: 0
        x_high: 0.2
        'y': 5.96
        y_low: 5.57
        y_high: 6.35
        table_ref: 6

Can be referring to Table 6 in publications

- publication:
      title: 'A measurement of K*+- production in the hyperon beam experiment at CERN'
      inspire_record: 569120
      hepdata_doi: 10.17182/hepdata.43582.v1
    description: |
      Differential production cross sections for K*- as a function of PT**2.
    table_num: 6
    hepdata_doi: 10.17182/hepdata.43582.v1/t6

or

- publication:
      title: 'Xi- production by Sigma-, pi- and neutrons in the hyperon beam experiment at CERN'
      inspire_record: 448181
      hepdata_doi: 10.17182/hepdata.47644.v1
    description: No description provided.
    table_num: 6

@ntrrgc
Copy link
Contributor

ntrrgc commented Jun 9, 2016

No, you're misunderstanding! It refers to the seventh (index 6) table defined in the tables array.

But seeing someone could mistake that... I'm better off removing it and substituting it for the table DOI, which takes a bit more of effort to link to the table object, but has less potential for confusion.

@ntrrgc
Copy link
Contributor

ntrrgc commented Jun 9, 2016

Modified in 3215c2d.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants