Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

nanoAOD_ttbar latency #280

Closed
Moelf opened this issue Oct 10, 2023 · 26 comments · Fixed by #282
Closed

nanoAOD_ttbar latency #280

Moelf opened this issue Oct 10, 2023 · 26 comments · Fixed by #282

Comments

@Moelf
Copy link
Member

Moelf commented Oct 10, 2023

          Thanks for looking at this. You can find a file with 10 events in https://cernbox.cern.ch/s/Vy6Yhz47vDaYiWa

Originally posted by @grasph in #273 (comment)

@Moelf
Copy link
Member Author

Moelf commented Oct 10, 2023

ah ha, @grasph, looks like this is automatically "fixed" on Julia 1.10:

   _       _ _(_)_     |  Documentation: https://docs.julialang.org
  (_)     | (_) (_)    |
   _ _   _| |_  __ _   |  Type "?" for help, "]?" for Pkg help.
  | | | | | | |/ _` |  |
  | | |_| | | | (_| |  |  Version 1.9.3 (2023-08-24)
 _/ |\__'_|_|_|\__'_|  |  Official https://julialang.org/ release
|__/                   |

julia> using UnROOT
[ Info: Precompiling UnROOT [3cd96dde-e98d-4713-81e9-a4a1b0235ce9]

julia> f = @time ROOTFile("./nanoaod_ttbar.root");
  0.257723 seconds (304.96 k allocations: 20.962 MiB, 4.06% gc time, 97.23% compilation time: 21% of which was recompilation)

julia> tree = @time LazyTree("./nanoaod_ttbar.root", "Events");
 24.559936 seconds (11.57 M allocations: 1.527 GiB, 0.44% gc time, 97.57% compilation time: <1% of which was recompilation)
   _       _ _(_)_     |  Documentation: https://docs.julialang.org
  (_)     | (_) (_)    |
   _ _   _| |_  __ _   |  Type "?" for help, "]?" for Pkg help.
  | | | | | | |/ _` |  |
  | | |_| | | | (_| |  |  Version 1.10.0-beta3 (2023-10-03)
 _/ |\__'_|_|_|\__'_|  |  Official https://julialang.org/ release
|__/                   |

julia> f = @time ROOTFile("./nanoaod_ttbar.root");
  0.144006 seconds (235.61 k allocations: 17.266 MiB, 98.26% compilation time)

julia> tree = @time LazyTree("./nanoaod_ttbar.root", "Events");
  1.704714 seconds (11.09 M allocations: 1.500 GiB, 5.74% gc time, 54.80% compilation time)

@Moelf
Copy link
Member Author

Moelf commented Oct 10, 2023

It might be a fun exercise to find out what can we do for v1.9, but notice unlike #279 (comment). This is NOT a case of inference failure causing boxing, because the total number of allocation and amount are almost identical:

11.09 M allocations: 1.500 GiB
11.57 M allocations: 1.527 GiB,

@oschulz
Copy link
Member

oschulz commented Oct 11, 2023

What magic is 1.10 doing here, I wonder?

@Moelf
Copy link
Member Author

Moelf commented Oct 11, 2023

Frames said this on Slack:

I know large numbers of keyword arguments (as in thousands) were causing a lot of problems in for circuit modelling, because circuit models are often super parameterized.
And I believe that kicked off at least a round of work to improve compiler handling of that maybe 9-11 months ago.
So it could be that work, which did make it into 1.10.

@oschulz
Copy link
Member

oschulz commented Oct 11, 2023

Thanks @Moelf

@Moelf
Copy link
Member Author

Moelf commented Oct 11, 2023

Some more comments:

It is still 17 times slower than RDataFrame and it does not scale well with the number of branches.

17x seems large but I don't think it's problematic when it's 0.1s vs. 1.4s. Considering ROOT takes so long to compile I'm not surprised. If anything, we can try to add more precompilation. I don't know how much wider does a root file get, NanoAOD has 1.5k branches. I would argue if user deals with 5k columns or whatever, at that point they ought know they should select columns.

Nothing infinitely scales to all cases. The current design has 2 main objective:

  • support fast iteration once constructed, instead of asking user to construct something else again
  • customized display()

The second point is coupled to the first point, because naively you may want to just return Dict or some light weight DataFrame, but then your display() is all messed up, we need control over both types to customize display().


Besides the remaining latency with 1.10.X, asking to move to the latest beta version gives a bad impression to the user on Julia, while the issue is that we are pushing Julia to its limits by using long tuples.

I'm saying just wait. It's already Julia 1.10-beta3, so not that far from release.

@Moelf
Copy link
Member Author

Moelf commented Oct 11, 2023

Finally, I'm happy to have a sink keyword interface or something, taking inspiration from CSV.jl or Arrow.jl.

But what those 2 packages are NOT good at and don't care at all, is the user experience out of the box (the display, the iteration capability). We probably shouldn't 'do that.

We also should think harder about what function names / apis are actually gonna be called, I think LazyTree is not perfect considering RNTuple will come eventually. You want to convey 2 things:

  • QuickRead(file, treename) is meant for column access, for-loop is going to be slow
  • TypedTree(tree) is meant for row-iteration.

Alternatively, we can have:

  • UnROOT.read_root(file, treename, sink)
  • and default sink to DictTree and have a TypedTree as option

@oschulz
Copy link
Member

oschulz commented Oct 11, 2023

NanoAOD has 1.5k branches

Just curious, why would anyone make that many branches? Are they used to emulate what should be arrays, or so? I'm not saying we shouldn't be able to handle it well, I'm just surprised at the sheer number.

@Moelf
Copy link
Member Author

Moelf commented Oct 11, 2023

they include systematics, there are O(10) different types of objects you reconstruct, each has a few dozens of systematics. And then there are all the L1 trigger and HLT triggers. Many branches are already jagged.
https://cms-nanoaod-integration.web.cern.ch/integration/master/mc94X_doc.html

@oschulz
Copy link
Member

oschulz commented Oct 11, 2023

And we can't really do hierarchies - I get it.

@Moelf
Copy link
Member Author

Moelf commented Oct 11, 2023

@grasph I honestly think you also just want to tell them to select, like, it's absurd to just open all the branches unless you're first day on the job.

I mean first of all tutorials should be given with slimmed files, and second, we don't have CMS nanoAOD systematics CP tools anyway, so it's pointless to read all branches

@grasph
Copy link
Member

grasph commented Oct 11, 2023

Related issue: #272

These improvements with 1.10.x are great. Thanks, Jerry, for the investigations.

I think we should still have a low-latency option especially to address the data-frame like analysis, for which fast row access is not required. It can be a parameter to LazyTree constructor to switch to this option.

I made some tests showing that with some pre-compilations (for common column types) and untyped columns, we can reach a 0.3s setup time. Measurements (done with Julia 1.9.1) within the comments of the code below.

using UnROOT

file = ROOTFile("nanoaod.root");

tree = file["Events"];
brnames = keys(tree);

function f1(file, t, brs)
    res = Vector{LazyBranch}(undef, length(brs))
    for (i,x) in enumerate(brs)
        res[i] = LazyBranch(file, t[x])
    end
    res
end

function f4(file, t, brs)
    res = Vector{Pair{Symbol, LazyBranch}}(undef, length(brs))
    for (i,x) in enumerate(brs)
        res[i] = (Symbol(x) => LazyBranch(file, t[x]))
    end
    res
end

#this list of branches cover all branch types found in the nanoaod.root test file
brwarmup = ["run","event","BeamSpot_type","BeamSpot_sigmaZ","nboostedTau","boostedTau_idAntiEle2018","boostedTau_jetIdx","boostedTau_charge","boostedTau_chargedIso","Electron_convVeto","TrigObj_id","L1Reco_step"]

# Timing provided in comments were measured with only one group
# of code between two #----... lines  included. UnROOT v0.10.15

#----------------------------------------------------------------------
# 0.78s
@time f1(file, tree, brwarmup)
# 0.27s
@time brs = f1(file, tree, brnames);
#----------------------------------------------------------------------

#----------------------------------------------------------------------
##1.05 s
#@time brs = f1(file, tree, brnames);
##0.27 s
#@time brs = f1(file, tree, brnames);
#----------------------------------------------------------------------

#----------------------------------------------------------------------
#using DataFrames
## 1.04s
#@time "dict warmup" wu = Dict(f4(file, tree, brwarmup));
##0.95 seconds (1.09 M allocations: 73.972 MiB, 2.18% gc time, 99.83% compilation time)
#@time "warmup df wrapping" df = DataFrame(wu, copycols=false);
## 0.27s
#@time "dict" brs = Dict(f4(file, tree, brnames));
## 0.0015s (0.9 seconds without the dataframe-wrapping warmup)
#@time "df wrapping" df = DataFrame(brs, copycols=false);
## 0.15s
#@time "sum" sum(df.nMuon)
#----------------------------------------------------------------------

#----------------------------------------------------------------------
#using DataFrames
## 1.9s
#@time "DataFrame warmup" DataFrame(f4(file, tree, brwarmup), copycols=false);
## 0.28s
#@time "DataFrame" df = DataFrame(f4(file, tree, brnames), copycols=false);
## 0.15s
#@time "sum" sum(df.nMuon)
#----------------------------------------------------------------------

#----------------------------------------------------------------------
##1.34 s
#@time "Dict warmup" brs = Dict(f4(file, tree, brnames));
##0.29s
#@time "Dict" brs = Dict(f4(file, tree, brnames));
##0.21s
#@time "sum" sum(brs[:nMuon])
#----------------------------------------------------------------------

@Moelf
Copy link
Member Author

Moelf commented Oct 11, 2023

Thanks a lot! Can this wait? We can discuss with @tamasgal at juliahep workshop maybe

@grasph
Copy link
Member

grasph commented Oct 12, 2023

It would be good to have a solution before the workshop and we can discuss about improvements there.

The main contributors to the number of branches of the CMS nanoad are actually the trigger line flags: it has one branch per line and trigger many lines.

Although the setup overhead is large compared to the time to read all branches, the main point is not about reading all of them, but to have access to all the branches from the LazyTree object.

The issue with the display comes also from the compilation time: it took me 279s to call show(::LazyTree) with UnROOT v.0.10.17 and Julia 1.9.1 and 148s with UnROOT v.0.10.17 and Julia 1.10.0beta3. So 1.10.x does NOT solve the issue for the display and when the LazyTree is instantiated interactively without an ending ';'.

The issue is twofold:

  1. DataFrame (or any user choice table) use case
  2. LazyTree for a large number of branches.

Current solution for (1) is to pass a LazyTree to the constructor of the Table of choice. It's inefficient if LazyTree instantiation takes time. This can be solved by providing a method to fill any container following the Tables interface ala CSV.read or a method like the f4() method of my previous comment. It is simple to implement and quite independent from the existing code.

(2) can be addressed by providing two options for LazyTree: with typed (current NamedTuple option) and untyped columns. The code is more involved, but I have already a prototype.

@tamasgal
Copy link
Member

I definitely tend to towards the first approach. Something like a generic interface which can be turned into LazyTree or DataFrame similar to what you mentioned with CSV.read.

I have not spent enough time on this problem yet to be honest. I use UnROOT on a daily basis but only a with a handful (but fairly large and complex) branches and for those of course it works totally fine.

@Moelf
Copy link
Member Author

Moelf commented Oct 12, 2023

ok what about this temporary stopgap solution:

we add a new keyword argument so something like:

LazyTree(file, treename; sink = ...)

if !isnothing(sink), then after forming the Dict{Symbol, LazyBranch}, we call the sink constructor. The (mental) interface here is that the sink should be compatible with ColumnTable, so the sink() must be able to take in our Dict.


This has two advantages:

  1. non-breaking (we can do a breaking change for UnROOT1.0, up to discussion)
  2. similar to CSV.read

@grasph
Copy link
Member

grasph commented Oct 12, 2023

You would expect the LazyTree() function to returns a LazyTree instance, so I think it's cleaner to have a new function e.g., getTree(), that will take a sink type. If the sink type is LazyTree() then it will do the same as LazyTree().

You mentioned breaking changes, all the proposals I've made so far preserve backward compatibility.

@grasph
Copy link
Member

grasph commented Oct 12, 2023

(f4() is also a good option)

@Moelf
Copy link
Member Author

Moelf commented Oct 12, 2023

I agree. But I don't know what we should name the getTree() function yet, which is why I'm trying to defer naming this until 1.0.

I think effectively this is a getTreeOrRNTuple(), but we should give it a good name so it's not too confusing that it does something different than plain getindex()

@grasph
Copy link
Member

grasph commented Oct 12, 2023

We can pick up a name and write in the documentation that it can change. An alternative name can be readTree(), although the reading is deferred.

@Moelf
Copy link
Member Author

Moelf commented Oct 12, 2023

again, readTree doesn't capture the fact you read RNTuple use the same function.

If we pick up a name and make it public API, that's potentially one more thing to break on 1.0

@grasph
Copy link
Member

grasph commented Oct 12, 2023

same as for LazyTree ;)

@grasph
Copy link
Member

grasph commented Oct 12, 2023

Anyway, let's just add a sink option to LazyTree as you proposed. I've updated the code in my fork sink-option branch. Let me know if I should make a PR.
See changes here (only one file modified).

@grasph
Copy link
Member

grasph commented Oct 12, 2023

There is still an issue with the master (b59a63). Below the LazyTree call during with sink-opt patch on top of v0.10.15 and master for my nano AOD file.

Table type master v0.10.15
LazyTree 120 s 21 s
DataFrames.DataFrame 100 s 2.3s

@Moelf
Copy link
Member Author

Moelf commented Oct 12, 2023

same as for LazyTree ;)

But we already have LazyTree, so adding a sink options is not breaking, we will just recommend a new interface at 1.0.

But if you add getTree() and remove it at 1.0, that's an additional breaking.

Can you show more complete code for what you're doing with Data frame? I can investigate soon

@grasph
Copy link
Member

grasph commented Oct 12, 2023

The issue came from the merge. I merged with origin/master instead of origin/main. Now it's fixed. I'll do the PR.

PS: by investigating, I've found out why the TDirectory cache, now removed, was not working properly.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
4 participants