From 5f6b5546e491b6048f4728ffbc3aeb275e33e80b Mon Sep 17 00:00:00 2001 From: Matthew Feickert Date: Sat, 26 Oct 2024 02:24:37 +0200 Subject: [PATCH] fix: Apply final revisions (#10) * Add self notes. * Apply final revisions. * Add open data docs link: https://atlas-physlite-content-opendata.web.cern.ch/ * Add backup slide. --- talk.md | 76 ++++++++++++++++++++++++++++++++++++++++++++++++--------- 1 file changed, 64 insertions(+), 12 deletions(-) diff --git a/talk.md b/talk.md index 361115a..a24d668 100644 --- a/talk.md +++ b/talk.md @@ -17,6 +17,17 @@ October 21st, 2024 --- # Challenges for Future Analysis + + .kol-1-2[

@@ -38,16 +49,26 @@ October 21st, 2024

.caption[(Jana Schaarschmidt, [CHEP 2023](https://indico.jlab.org/event/459/contributions/11586/))] -[.center.bold[PHYSLITE]](https://atlas-physlite-content.web.cern.ch/) +[.center.bold[PHYSLITE]](https://atlas-physlite-content-opendata.web.cern.ch/) * Common file format for .bold[Run 4 Analysis Model] -* Contains already-calibrated objects for fast analysis * Monolithic: Intended to serve ~80% of physics analysis in Run 4 +* Contains already-calibrated objects for fast analysis * Will be able to use directly without need for ntuples ] --- # Pythonic Ecosystem for ATLAS Analysis + + .kol-1-3[ .large[ Providing the elements of a .bold[columnar analysis pipeline] @@ -72,6 +93,15 @@ Providing the elements of a .bold[columnar analysis pipeline] --- # Composing structure of an ATLAS AGC + + .kol-1-5[

@@ -86,16 +116,28 @@ End user analysis ideally uses .bold[smaller and calibrated PHYSLITE] .kol-4-5[

-.center.large[Components of an ATLAS AGC demonstrator pipeline] +.center.large[Components of an ATLAS [Analysis Grand Challenge (AGC)](https://agc.readthedocs.io/)
demonstrator pipeline .smaller[(c.f. [The 200Gbps Challenge](https://indico.cern.ch/event/1338689/contributions/6009824/) (Alexander Held, Monday plenary))]]

] --- # Challenges: Reading all PHYSLITE files + + .kol-1-2[ .large[ -* Raw [PHYSLITE](https://atlas-physlite-content.web.cern.ch/) is not easily loadable by columnar analysis tools outside of ROOT +* Raw [PHYSLITE](https://atlas-physlite-content-opendata.web.cern.ch/) is not easily loadable by columnar analysis tools outside of ROOT - Challenges for correctly handling `ElementLinks` and custom objects .smaller[(e.g. triggers)] * Awkward Array supports [`behaviors`](https://awkward-array.org/doc/2.6/reference/ak.behavior.html), which allow for efficiently reinterpreting data on the fly * ATLAS members have contributed to open ecosystem development to support PHYSLITE in both [Uproot](https://uproot.readthedocs.io/en/stable/) and [Coffea](https://coffeateam.github.io/coffea/api/coffea.nanoevents.PHYSLITESchema.html#coffea.nanoevents.PHYSLITESchema) @@ -237,13 +279,13 @@ from atlascp import EgammaTools # Columnar CP tool backend performance tests .huge[ -* During (ongoing) refactor added preliminary integrated benchmark to measure .bold[time spent in tool per event] (not i/o) and compare to xAOD model +* During (ongoing) refactor added preliminary integrated benchmark to measure .bold[time spent in tool per event] (not I/O) and compare to xAOD model * While direct comparison not possible, tests are as close as possible - Only involves `C++` CP tool code (no Python involved) - Uses same version of CP tool - - xAOD includes event store access -* Show .bold[substantial speedups] for migrated tools: .bold[columnar is 2-4x faster] than xAOD interface - - Time for i/o and connecting columns not included in the performance comparisons (not optimized in the tests, so removed from benchmark) + - xAOD includes event store access (per-event overhead, paid per-batch in columnar) +* Show .bold[substantial speedups] for migrated tools: .bold[columnar is 2-4x faster] than xAOD interface (EDM access dependent) + - Time for I/O and connecting columns not included in the performance comparisons (not optimized in the tests, so removed from benchmark) ]