Skip to content

Commit 250b4b5

Browse files
authored
Add RNTuple update blog post (root-project#962)
1 parent ebff622 commit 250b4b5

File tree

3 files changed

+188
-0
lines changed

3 files changed

+188
-0
lines changed

assets/images/rntuple_blog_adl.png

188 KB
Loading
65.7 KB
Loading
Lines changed: 188 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,188 @@
1+
---
2+
title: "RNTuple: Where are we now and what's next?"
3+
layout: archive
4+
author: Florine de Geus
5+
---
6+
7+
Hello, this is Florine from the ROOT team! Over the past year, I’ve been working
8+
as a technical student funded by ATLAS to evaluate and help further develop
9+
RNTuple. As you may already be aware, RNTuple [[1]](#rntuple-evolution) is
10+
currently being developed as the successor to TTree, and is projected to be
11+
used in Run 4. I imagine you might be wondering why there is a need for a
12+
completely new (TTree-incompatible) system, and what this looks like. That’s why
13+
in this blog post, I will try to answer this question, as well as give you an
14+
overview of the current status of RNTuple, what we’re still working on before
15+
its first production release (and what we will work on beyond this), and
16+
finally how you can already try it out!
17+
18+
## Why do we need RNTuple?
19+
20+
At this point, ROOT has been around for more than a quarter of a century – and
21+
TTree for just as long. And as you might imagine, the computing landscape today
22+
looks vastly different compared to 25 years ago. Just to set the scene: when
23+
ROOT was first released, there was no C++ standard yet and parallel (let alone
24+
distributed) computing really wasn’t a thing yet. On the hardware side, modern
25+
storage technologies such as SSDs and object stores were still unheard of, and
26+
let’s not forget to mention the evolution of networking technologies!
27+
Naturally, TTree wasn’t designed and implemented with these things in mind. Now
28+
of course, over the years a lot of effort has been put into improving the
29+
performance and stability of TTree to make it compatible with modern computing
30+
practices as much as possible. However, there are limits to what is possible in
31+
this regard, especially given the fact that backwards- and
32+
forwards-compatibility are two major requirements for ROOT’s I/O system. This
33+
has led to the fact that with the High-Luminosity LHC on the horizon, where 90%
34+
of the total amount of LHC data is expected to be produced [[2]](#atlas-roadmap),
35+
we need to think about more optimized ways to store physics
36+
data. The challenge here is that this data is unique in the sense that events
37+
(or, in computer science terms, “entry” or “row”) are statistically independent
38+
of each other. At the same time one event typically contains many (complex) data
39+
structures, of which we often only need a small subset at a time, and we found
40+
out that standard technologies are not well-tuned for this type of data
41+
storage [[3]](#hep-data-formats). That is why we decided to combine the years
42+
of experience with TTree and various industry best-practices and invest in the
43+
next generation of high-energy physics data storage. Enter RNTuple!
44+
45+
## Where we are now?
46+
47+
For the past four years, a lot of effort has been put into making RNTuple the
48+
best it can be. We are working closely with the experiments to make sure that
49+
RNTuple can support their data models across all relevant stages in the
50+
production pipeline. Simultaneously, we want to make sure that it is as
51+
optimized as possible. This means making sure that the data stored in RNTuple
52+
is as compact as possible, and at the same time coming up with ways in which we
53+
can make reading and writing RNTuples to and from memory as fast as possible.
54+
To give you an idea of where we’re currently at, the plot below shows the
55+
average on-disk event size for ATLAS’s DAOD_PHYS data model [[4]](#atlas-edm),
56+
comparing TTree and RNTuple. With RNTuple, we could potentially
57+
save 20-35% of storage space, and in turn reduce the consumed network bandwidth
58+
when reading the data from a remote location. When we’re talking about exabytes
59+
of event data, this is quite significant!
60+
61+
<center>
62+
<img
63+
src="{{'/assets/images/rntuple_blog_size_mc.png' | relative_url}}"
64+
alt="DAOD_PHYS storage efficiency"
65+
style="width: 80%" width="1600" height="749" />
66+
</center>
67+
68+
Besides storage efficiency, we’re also seeing very promising results when it
69+
comes to read throughput. The two plots below show the number of events
70+
processed per second for two different types of tasks, comparing ATLAS
71+
DAOD_PHYSLITE data sets stored in TTree and RNTuple (stored on an SSD). As you
72+
can see, RNTuple is remarkably faster than TTree, and similar observations are
73+
made for other data sets [[1]](#rntuple-evolution), [[5]](#rntuple-status).
74+
75+
<center>
76+
<img
77+
src="{{'/assets/images/rntuple_blog_adl.png' | relative_url}}"
78+
alt="PHYSLITE ADL benchmarks"
79+
style="width: 80%" width="1600" height="749" />
80+
</center>
81+
82+
Beyond performance, we have also been working hard on RNTuple’s interface and
83+
supported features. This includes compatibility with RDataFrame, being able to
84+
read and write C++ STL types as well as user-defined types and various other
85+
features to support existing experiment frameworks.
86+
87+
## Can I try it out?
88+
89+
Yes! To be able to read and write RNTuples, the first thing you’ll need is a ROOT
90+
installation that includes the
91+
[ROOT 7 experimental features enabled](https://root.cern/install/build_from_source/#enabling-experimental-features-aka-root7).
92+
This is the case for the default LXPLUS installation, which runs ROOT’s (at the
93+
time of writing) latest release, [6.30.02](https://root.cern/releases/release-63002/)!
94+
If you are running ROOT in a different way, you can easily check if ROOT 7 is
95+
enabled for your installation by running `root-config --has-root7` in your terminal.
96+
If this returns `yes`, you’re all set! If you get a `no`, you will need to use a different
97+
installation of ROOT that does. Check out the [ROOT installation page](https://root.cern/install)
98+
to get it. We strongly recommend using the most recent release in order to get
99+
the latest and greatest from RNTuple.
100+
101+
Now, on to the fun part: using RNTuple! Of course, you could write a new RNTuple
102+
completely from scratch, using fields and data that you come up with. This is
103+
done using the [`RNTupleWriter`](https://root.cern/doc/master/classROOT_1_1Experimental_1_1RNTupleWriter.html)
104+
interface. Reading an RNTuple is then naturally done through the
105+
[`RNTupleReader`](https://root.cern/doc/master/classROOT_1_1Experimental_1_1RNTupleReader.html).
106+
To get an idea of what this looks like in practice, check out for example
107+
[this tutorial](https://root.cern/doc/master/ntpl001__staff_8C.html).
108+
109+
Of course, it would be more interesting to try out RNTuple with real data, for
110+
example with data from an analysis ntuple that is currently stored as a TTree.
111+
Well, good news! RNTuple also comes with an [`RNTupleImporter`](https://root.cern/doc/master/classROOT_1_1Experimental_1_1RNTupleImporter.html)
112+
class that allows you to automatically convert your TTrees to RNTuples. This
113+
can be as simple as executing the following two lines in the ROOT prompt. The
114+
input file containing the source TTree is read remotely, meaning you can
115+
directly copy-paste these lines into your ROOT prompt. Of course, it’s entirely
116+
possible to use your own existing TTrees.
117+
118+
```cpp
119+
root [0] auto importer = ROOT::Experimental::RNTupleImporter::Create(
120+
"http://root.cern/files/HiggsTauTauReduced/GluGluToHToTauTau.root",
121+
"Events",
122+
"my_rntuple.root")
123+
root [1] importer->Import()
124+
```
125+
126+
This will convert your TTree (called `Events` here) into an RNTuple also called
127+
`Events` and write it to `my_rntuple.root`. Easy enough, but maybe you want more
128+
control over this newly created RNTuple. For example, you might want to change
129+
its name, or set the compression settings to something other than the default.
130+
This (and more) can all be tweaked! Check out
131+
[the reference](https://root.cern/doc/master/classROOT_1_1Experimental_1_1RNTupleImporter.html)
132+
or [this tutorial](https://root.cern/doc/master/ntpl008__import_8C.html) to see
133+
what options are possible.
134+
135+
Now, I already mentioned that we have been working on RNTuple compatibility with
136+
RDataFrame. Currently, with just one line change, you will be able to use
137+
your existing analysis code with data stored in RNTuple:
138+
139+
```cpp
140+
// Change this:
141+
ROOT::RDataFrame df("Events", "http://root.cern/files/HiggsTauTauReduced/GluGluToHToTauTau.root");
142+
143+
// To this to use the RNTuple you just imported into "my_rntuple.root":
144+
ROOT::RDataFrame df = ROOT::RDF::Experimental::FromRNTuple("Events", "my_rntuple.root");
145+
146+
// Use your existing analysis as-is!
147+
```
148+
149+
> 💡 _The automatic detection of RNTuples in RDataFrame is currently available in
150+
ROOT’s `master` branch and will be available in ROOT 6.32.00!_
151+
152+
## Next steps for RNTuple
153+
154+
So, what’s next? Performance is always one of our main concerns. We are
155+
currently working on parallelizing the writing of RNTuples. In addition, we are
156+
working on what we like to call “interface ergonomics”, i.e. the way developers
157+
will interact with RNTuple. Be aware that this means that the RNTuple
158+
interfaces might still change a little in the coming months! Next to all of
159+
this, we are preparing for larger-scale performance testing to see in what
160+
areas we could further improve. Another area of work for the near future will
161+
be in the direction of _data set combinatorics_ – that is, finding smart(er)
162+
ways of accessing and combining existing RNTuple data. And of course, we
163+
will continue to work with the experiments to make sure the transition to
164+
RNTuple will be as smooth as possible.
165+
166+
To wrap things up, things are looking good for RNTuple, and while there is still
167+
enough work to be done, we’re excited and eager to make RNTuple as good as it
168+
can be! If you want to know more about the evolution and performance of
169+
RNTuple, be sure to check out the references below, as well as
170+
[our other publications](https://root.cern/about/publications/#io).
171+
If you are eager to dive deeper into the specifics of the RNTuple binary format, you can
172+
[read the specification here](https://github.com/root-project/root/blob/368dd4999733efe3eb9eb171bfc1b57560259443/tree/ntuple/v7/doc/specifications.md#L1).
173+
Finally, reach out to us on the [forum](https://root-forum.cern.ch/) if you have
174+
any questions or if you would like to
175+
[contribute to RNTuple or ROOT](https://root.cern/contribute/) in general!
176+
177+
## References
178+
179+
<a name="rntuple-evolution">[1]</a> J. Blomer, P. Canal, A. Naumann, and D. Piparo, “Evolution of the ROOT Tree I/O,” EPJ Web Conf., vol. 245, 2020, doi: [10.1051/epjconf/202024502030](https://doi.org/10.1051/epjconf/202024502030).
180+
181+
<a name="atlas-roadmap">[2]</a> ATLAS Collaboration, “ATLAS Software and Computing HL-LHC Roadmap,” CERN, Geneva, CERN-LHCC-2022-005, LHCC-G-182, 2022. Accessed: May 02, 2023. [Online]. Available: <http://cds.cern.ch/record/2802918>.
182+
183+
<a name="hep-data-formats">[3]</a> J. Blomer, “A quantitative review of data formats for HEP analyses,” J. Phys. Conf. Ser., vol. 1085, p. 032020, Sep. 2018, doi: [10.1088/1742-6596/1085/3/032020](https://doi.org/10.1088/1742-6596/1085/3/032020).
184+
185+
<a name="atlas-edm">[4]</a> J. Elmsheuser et al., “Evolution of the ATLAS analysis model for Run-3 and prospects for HL-LHC,” EPJ Web Conf., vol. 245, 2020, doi: [10.1051/epjconf/202024506014](https://doi.org/10.1051/epjconf/202024506014).
186+
187+
<a name="rntuple-status">[5]</a> J. Lopez-Gomez and J. Blomer, “RNTuple performance: Status and Outlook.” arXiv, Apr. 07, 2022. doi: [10.48550/arXiv.2204.09043](https://doi.org/10.48550/arXiv.2204.09043).
188+

0 commit comments

Comments
 (0)