-
Notifications
You must be signed in to change notification settings - Fork 5
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Investigate conversion output for ROOT #3
Comments
If by output for ROOT you mean root script (like the one here http://hepdata.cedar.ac.uk/view/ins1382590/d2/root) I don't think that it would be a problem, I might work on this in parallel with CSV format as they will probably share some similarities in how the data is processed. By the way - how should I detect what type of data is in the table? HEPData frontend somehow does it, so it should be possible event without it being specified explicite @eamonnmag - I heard you were responsible for drawing data in frontend, can you share how you detect what kind of data it is? |
The current "ROOT" export is really just a CINT script that makes a plot. We don't want to continue this format in the new system, at least not initially. Instead we should export the data to suitable ROOT objects (depending on the data type) and write them as a binary |
I was also thinking that it would be better to export directly to binary format instead of an interpreter script, thankfully root provides very good integration with python via ROOT package, so writing actual binary objects should be even easier than creating CINT (or its newer incarnation in root6), in addition to being faster for the client side. |
+1 |
But my question still holds, how should I find out what type of data I'm working on ( |
Pretty much all the current hepdata tables could be encoded as histograms
|
Not sure I understand the question but you can access the type of a object stored in a ROOT file to find out if it is a TH1,2, etc ( |
The question was more, "given a current table in HEPdata, what type of object in ROOT should represent it". Assume that there is no pre-existing ROOT object since all the files are being imported afresh. |
The question was a little different - ROOT is the output format, the input is "almost" raw data (with some metadata describing it) - https://github.com/HEPData/hepdata-submission. So in order to construct proper ROOT objects ( |
For now I'm creating file which looks like this (image in the attachement). The class used to store data is the same as in the original HEPData root output: |
That looks good. I agree that
At a later stage we can try to write one |
Hi I haven’t read all of this, but very interested. Will reply more soon. Quick feedback: Kyle
|
The problem with histograms is that often the bin widths are not given in existing HepData records (e.g. last record added) and for some observables it is not even meaningful to give bin widths. As far as I know (?), a |
The good thing about ROOT format is that it can contain virtually unlimited number of object inside. So I would say that as a further enhancement we can provide histogram objects for the data that would allow it, and just write it to the directory of the table. This way user will have a choice which object to use. The root file from which screenshot was taken is in the mszostak/root branch in git repository in hepdata-converter/hepdata_converter/testsuite/testdata/root/root.full (https://github.com/HEPData/hepdata-converter/blob/mszostak/root/hepdata_converter/testsuite/testdata/root/full.root). It will be updated after making improvements to the code. As for multiple independent variables - I understand that for two I can (should) use So if current title becomes axis labels, what should the title of the graph / histogram be? Name of the table is a little bit ambiguous because there may be a couple of graphs / histograms per single table. As for histograms ( |
Really good progress on this. Thanks for the great work and advice from all. |
After a talk with @eamonnmag I remembered another problem which I encountered - ROOT embeds filepaths into binary files, which can be a security problem (and is generally something to be avoided) especially if the files are to be generated on the server side. @GraemeWatt - is there any way to clear root files from such metadata? [EDIT] Additionally I fixed the issue with the axis naming and x axis error bars: |
Great, we can always refine things like titles later on. Another reason why Depending on the number of independent variables, we should write either a For writing histograms, we need to invent a standard naming format and write one histogram for the central value (e.g. I don't know how to get rid of the file paths embedded in the binary |
So in case of two independent variables ( The same question applies to As for alphanumeric values I don't suppose we support them... or do we? Someone more knowledgeable in the matter can comment on this. |
The YAML format was designed to be very flexible to support the diversity of data types already in the existing HepData system and that might be provided in future, i.e. any number of independent and dependent variables (which can possibly be non-numeric). For the ROOT export we need to be more selective and we should not aim to provide a ROOT object for all possible data types. We should check the data type and export to a suitable ROOT object only if it is possible. If not, then we don't write any ROOT object (or only write |
Great, so to sum up:
Anything I forgot? |
Sounds good. For a |
I updated root output (new version in master, as well as in PyPI (0.1.15). Now histograms for all errors are created. @GraemeWatt is this exactly what you wanted, or something is still lacking? We can discuss naming conventions now - the one used at this moment (concatenated names of the axes) is pretty evident, but a little long, is it acceptable? Also some sanitization was necessary (removal of EDIT: sample root file (used in tests) with this new histogram output is available here: https://github.com/HEPData/hepdata-converter/blob/master/hepdata_converter/testsuite/testdata/root/full.root |
Great, thanks a lot! But please also write a Yes, I think you need to change the names of the histograms. There is no need to reproduce long axes names in the histogram names. The histogram names should be short and easy to implement in user code. I made some suggestions for concise standard histogram names in a comment above, e.g. @cranmer, could you please check that @michal-szostak's implementation satisfies your requirements for ROOT histogram output and provide feedback for improvements to be made? |
What about indication of independent variable? I think it should also be specified. Format like: |
No, we should only write one ROOT object regardless of the number of independent (x) variables, e.g. for two independent variables, we write one @lukasheinrich will now help with testing the ROOT output and work on related extensions (ROOT input, HistFactory input/output, etc.). |
But what if there are more independent variables than ROOT object can contain? (in this case more then 3)? Should the error be thrown, how this case should be handled? |
We discussed this already above: just don't write any ROOT objects if there are too many independent variables. The majority of current HepData tables have only one or two independent variables. We should not aim to find a ROOT representation for all possible data formats of the YAML representation. |
Yes, but it still leaves problem with TGraph2DErrors which only accepts symmetric errors. So following this reasoning data with asymmetric errors and 2 independent variables should also be skipped, right? |
Exactly, we only write ROOT objects if possible, so skip this case. |
Alright, shall we extend object's naming convention to normal histograms and graphs? |
Yes, we should have a consistent naming scheme for all ROOT objects, e.g. |
Ok, one last thing - what about histograms for single asymmetric error? I would suggest something like: |
I think we should count starting from 1 for compatibility with the existing YODA output, and |
I pushed new version to master, new PyPI package is also available (version 0.1.16). All above comments has been included. Can you check @GraemeWatt whether I missed something? (example file: https://github.com/HEPData/hepdata-converter/blob/master/hepdata_converter/testsuite/testdata/root/full.root) |
Will try to check out the implementation as requested. Kyle
|
No description provided.
The text was updated successfully, but these errors were encountered: