diff --git a/assets/style.css b/assets/style.css
index 56cc855..15b9d47 100644
--- a/assets/style.css
+++ b/assets/style.css
@@ -434,15 +434,28 @@ td {
display: inline-block;
}
+/* Hack to get middle logo on*/
+.middle-logo {
+ content: '';
+ background: url("../assets/logos/logo_IRIS-HEP.png") no-repeat;
+ background-size: 100% 100%;
+ width: 196px;
+ height: 108px;
+ position: absolute;
+ bottom: 1.3em;
+ left: 25em;
+ display: inline-block;
+}
+
+/* Even worse hack to get another logo on */
.title-slide:before {
content: '';
- /* background: url("https://iris-hep.org/assets/logos/pyhf-logo.png") no-repeat; */
background: url("../assets/logos/logo_ATLAS.png") no-repeat;
background-size: 100% 100%;
- width: 243px;
- height: 128px;
+ width: 242px;
+ height: 127px;
position: absolute;
- bottom: 0.8em;
+ bottom: 1.8em;
right: 3.3em;
display: inline-block;
}
diff --git a/figures/atlas-pipeline.png b/figures/atlas-pipeline.png
index feaf352..82c2152 100644
Binary files a/figures/atlas-pipeline.png and b/figures/atlas-pipeline.png differ
diff --git a/figures/columnar-athena.png b/figures/columnar-athena.png
new file mode 100644
index 0000000..479e83c
Binary files /dev/null and b/figures/columnar-athena.png differ
diff --git a/figures/logos/coffea.jpg b/figures/logos/coffea.jpg
deleted file mode 100644
index dd7e5e7..0000000
Binary files a/figures/logos/coffea.jpg and /dev/null differ
diff --git a/figures/logos/coffea.png b/figures/logos/coffea.png
new file mode 100644
index 0000000..35f4a9f
Binary files /dev/null and b/figures/logos/coffea.png differ
diff --git a/figures/logos/coffea_logo.png b/figures/logos/coffea_logo.png
new file mode 100644
index 0000000..ba6625c
Binary files /dev/null and b/figures/logos/coffea_logo.png differ
diff --git a/figures/logos/nanobind.jpg b/figures/logos/nanobind.jpg
new file mode 100644
index 0000000..b7763e9
Binary files /dev/null and b/figures/logos/nanobind.jpg differ
diff --git a/figures/notebook-view.png b/figures/notebook-view.png
new file mode 100644
index 0000000..436b400
Binary files /dev/null and b/figures/notebook-view.png differ
diff --git a/figures/pyhep-tool-view.png b/figures/pyhep-tool-view.png
index 4196ed2..7fa0ff7 100644
Binary files a/figures/pyhep-tool-view.png and b/figures/pyhep-tool-view.png differ
diff --git a/talk.md b/talk.md
index cd3766b..361115a 100644
--- a/talk.md
+++ b/talk.md
@@ -2,17 +2,18 @@ class: middle, center, title-slide
count: false
# Building a Columnar Analysis Demonstrator
for ATLAS PHYSLITE Open Data
using the Python Ecosystem
-.large.blue[Matthew Feickert]
-on behalf of ATLAS Computing Activity
+KyungEon Choi, .blue[Matthew Feickert], Nikolai Hartmann, Lukas Heinrich, Alexander Held, Evangelos Kourlitis,
+Nils Krumnack, Giordon Stark, Matthias Vigl, Gordon Watts on behalf of the .bold[ATLAS Computing Activity]
.large[(University of Wisconsin-Madison)]
[matthew.feickert@cern.ch](mailto:matthew.feickert@cern.ch)
-
[International Conference on Computing in High Energy and Nuclear Physics (CHEP) 2024](https://indico.cern.ch/event/1338689/contributions/6015915/)
-
+
October 21st, 2024
+.middle-logo[]
+
---
# Challenges for Future Analysis
@@ -26,7 +27,7 @@ October 21st, 2024
.caption[([ATLAS Software and Computing HL-LHC Roadmap](https://cds.cern.ch/record/2802918), 2022)]
.large[
* Won't be able to store everything on disk
-* Move towards "trade CPU for disk" model
+* Move towards "trade disk for CPU" model
]
]
.kol-1-2[
@@ -37,95 +38,11 @@ October 21st, 2024
- - - -
- -.center.large[Different expressions/representations for same analysis result goals] -.caption[(Nick Smith, [2019 Joint HSF/OSG/WLCG Workshop](https://indico.cern.ch/event/759388/contributions/3306852/))] -] - ---- -# An Analysis Grand Challenge - -.large.center[ -HL-LHC era data scale requires rethinking interacting with data during analysis -] - -.kol-2-5[ -.large[ -* .bold[Analysis Grand Challenge] (AGC) community exercise organized by [IRIS-HEP](https://iris-hep.org/) includes the stages of a projected typical HL-LHC analysis -* Demonstrator of development of the required cyberinfrastructure - - [The 200Gbps Challenge: Imagining HL-LHC analysis facilities](https://indico.cern.ch/event/1338689/contributions/6009824/) (Alexander Held, Monday plenary) -* Opportunity for ATLAS to demonstrate columnar analysis views and areas for improvement -] -] -.kol-3-5[ -- - - -
- -.center.large[[High level view of operations in an HL-LHC analysis](https://iris-hep.org/grand-challenges.html#analysis-grand-challenge)] -] - ---- -# Pythonic Analysis Ecosystem for HEP - -.kol-2-5[ -- - - -
- -.center.huge[Broader "Scientific Python" ecosystem is designed to be interoperable and support [multiple domain levels](https://www.nature.com/articles/s41586-020-2649-2)] -] - -.kol-1-5[ -- -
-] - -.kol-2-5[ -- - - -
- -.center.huge[Interoperable domain hierarchy design continued in ["PyHEP" ecosystem](https://indico.cern.ch/event/1140031/)] +* Will be able to use directly without need for ntuples ] --- @@ -133,8 +50,8 @@ HL-LHC era data scale requires rethinking interacting with data during analysis .kol-1-3[ .large[ -Providing the elements of an analysis pipeline -- - - -
- -.center.large[Scalable platform for interactive (or noninteractive) analysis] -] - ---- -# Structure of an ATLAS AGC +# Composing structure of an ATLAS AGC .kol-1-5[- - - -
-.center[([ATLAS News, 2024-07-01](https://atlas.cern/Updates/News/Open-Data-Research))] - -- - - -
-] - --- # Challenges: Reading all PHYSLITE files .kol-1-2[ .large[ -* Raw PHYSLITE is not easily loadable by columnar analysis tools outside of ROOT -* Awkward Array supports [`behaviors`](https://awkward-array.org/doc/2.6/reference/ak.behavior.html), which allow efficiently reinterpreting data on the fly - - e.g. correctly handling `ElementLinks` -* ATLAS AMG members have contributed to open ecosystem development to support PHYSLITE in both Uproot and [Coffea](https://coffeateam.github.io/coffea/api/coffea.nanoevents.PHYSLITESchema.html#coffea.nanoevents.PHYSLITESchema) +* Raw [PHYSLITE](https://atlas-physlite-content.web.cern.ch/) is not easily loadable by columnar analysis tools outside of ROOT + - Challenges for correctly handling `ElementLinks` and custom objects .smaller[(e.g. triggers)] +* Awkward Array supports [`behaviors`](https://awkward-array.org/doc/2.6/reference/ak.behavior.html), which allow for efficiently reinterpreting data on the fly +* ATLAS members have contributed to open ecosystem development to support PHYSLITE in both [Uproot](https://uproot.readthedocs.io/en/stable/) and [Coffea](https://coffeateam.github.io/coffea/api/coffea.nanoevents.PHYSLITESchema.html#coffea.nanoevents.PHYSLITESchema) * Continuing to support fixes to both the PyHEP ecosystem tools as well as reporting issues to PHYSLITE - - Includes work by [ATLAS IRIS-HEP Fellow Sam Kelson](https://indico.cern.ch/event/1449314/contributions/6101290/) + - Work by [ATLAS IRIS-HEP Fellow Sam Kelson](https://indico.cern.ch/event/1449314/contributions/6101290/) ] ] .kol-1-2[ @@ -248,6 +111,10 @@ End user analysis ideally uses .bold[smaller and calibrated PHYSLITE] + +* More on ATLAS Open Data at CHEP 2024: + - [The First Release of ATLAS Open Data for Research](https://indico.cern.ch/event/1338689/contributions/6013332/)
@@ -335,12 +203,88 @@ Ongoing integration work into ATLAS Athena
.caption[Selected $m_{ee}$ under on-the-fly computed systematic variations of electron reconstruction efficiency and corrections
(Matthias Vigl, [ACAT 2024](https://indico.cern.ch/event/1330797/contributions/5796636/))]
]
+---
+# Iteratively moving columnar tools forward
+
+.kol-2-3[
+.large[
+* [v1 prototype](https://gitlab.cern.ch/gstark/pycolumnarprototype/-/blob/57ad135c84c4b874f057021f71afaf487cef6a13/Zee_demo.ipynb) established foundations of what was possible with new tooling
+ - Pythonic interfaces to CP tools could be written without heroic levels of work
+ - Prototype tools were promising, but more work needed to achieve necessary performance
+ - No "zero action" option — needed to create standalone prototype to determine if work was reasonable
+* [v2 prototype](https://gitlab.cern.ch/atlas-asg/columnar-athena) takes a step forward in scope
+ - Moves developments into ATLAS Athena and .bold[migrate ATLAS CP tools to columnar backend] without breaking existing workflows
+ - Adds thread-safety
+ - Adds [infrastructure support](https://gitlab.cern.ch/atlas/atlasexternals/-/merge_requests/1149) for development of columnar analysis tools
+ - Allows for full scale integration and performance tests
+]
+]
+.kol-1-3[
+
+ + + +
++ + + +
+] + +--- +# Columnar CP tool backend performance tests + +.huge[ +* During (ongoing) refactor added preliminary integrated benchmark to measure .bold[time spent in tool per event] (not i/o) and compare to xAOD model +* While direct comparison not possible, tests are as close as possible + - Only involves `C++` CP tool code (no Python involved) + - Uses same version of CP tool + - xAOD includes event store access +* Show .bold[substantial speedups] for migrated tools: .bold[columnar is 2-4x faster] than xAOD interface + - Time for i/o and connecting columns not included in the performance comparisons (not optimized in the tests, so removed from benchmark) +] + + + +--- +# Challenges: Tooling design decisions + +.large[ +* ATLAS CP tools were created 10-15 years ago to .bold[run in an analysis framework] + - Battle tested, extremely well understood, excellent physics performance, strong desire to be be maintained + - Rewrite cost is currently too high across collaboration to move to [`correctionlib`](https://cms-nanoaod.github.io/correctionlib/) paradigm + - Legacy code decisions highlight columnar prototype design decisions and opportunities during tool migration + - Columnar .bold[cracks open "black box"] implementations of tools for the new analysis model +* Raises the question: "What would it take to get to .bold[`python -m pip install atlascp`]?" + - Ambitious idea not as far fetched as you might think: [`pip install ROOT`](https://indico.cern.ch/event/1338689/contributions/6010410/) (Vincenzo Padulano, Monday Track 6) +* Columnar prototype explores these possibilities + - .bold[Adopting columnar backend] makes columnar paradigm possible + - .bold[Ongoing `nanobind` integration] bridges `C++`/Python with performance + - .bold[Pythonic API design] for high level analysis thinking +* Steps beyond: Modularization to level that allows packaging with [`scikit-build-core`](https://scikit-build-core.readthedocs.io/) + - Allows for "just another" tool in the PyHEP ecosystem +] + --- # ATLAS Open Data AGC Implementations .kol-1-2[ .large[ -* Tooling ecosystem is proving approachable and performant +* Tooling ecosystem is proving .bold[approachable and performant] for Pythonic columnar analysis of PHYSLITE * Enabling mentored university students to implement versions of the AGC by themselves in a Jupyter notebook * ATLAS IRIS-HEP Fellow Denys Klekots's [AGC project using .bold[ATLAS open data]](https://indico.cern.ch/event/1455396/contributions/6126406/) ([implementation on GitHub](https://github.com/iris-hep/agc-physlite)) * Simplified version of [IRIS-HEP AGC top reconstruction challenge](https://agc.readthedocs.io/) using 2025+2016 Run 2 Monte Carlo from the 2024 .bold[ATLAS open data] release @@ -368,11 +312,11 @@ Ongoing integration work into ATLAS Athena # Summary of ATLAS Columnar AGC Efforts .huge[ -* Development of a columnar ATLAS AGC implementation with full systematics is still ongoing -* Columnar analysis tool efforts inside of ATLAS have been promising with CP tools showing performance increases -* ATLAS Open Data proving to be useful for research and community communication -* Technical advancements from AMG research are being incorporated into ATLAS wide tooling +* Columnar analysis tool efforts inside of ATLAS have been promising with CP tools showing performance increases and bespoke UI +* Development of a columnar ATLAS AGC demonstrator with full systematics is ongoing supported by advancements in v2 prototype +* Technical advancements are being incorporated into ATLAS wide tooling * Contributions upstream to PyHEP community tools +* ATLAS Open Data proving to be useful for research and community communication * Advancements in tooling are enabling researchers across career stages ] @@ -392,11 +336,151 @@ class: end-slide, center .huge[Backup] +--- +# Columnar Analysis + +.center.large.bold[ +"columnar analysis" == "array programming for data analysis" +] + +.kol-1-2[ +.large[ +* Higher level APIs for physicists and improved user experience + - People using columnar analysis on ntuples already seem to be loving it + - Enable the same UX but without ntupling (save disk) +* Potential for higher performance + - Enable on-the-fly combined performance (CP) tool corrections on PHYSLITE +* Broader scientific data analysis ecosystem integration + - Extend and scale ATLAS tools with large and performant ecosystem +] +] +.kol-1-2[ + ++ + + +
+ +.center.large[Different expressions/representations for same analysis result goals] +.caption[(Nick Smith, [2019 Joint HSF/OSG/WLCG Workshop](https://indico.cern.ch/event/759388/contributions/3306852/))] +] + +--- +# An Analysis Grand Challenge + +.large.center[ +HL-LHC era data scale requires rethinking interacting with data during analysis +] + +.kol-2-5[ +.large[ +* .bold[Analysis Grand Challenge] (AGC) community exercise organized by [IRIS-HEP](https://iris-hep.org/) includes the stages of a projected typical HL-LHC analysis +* Demonstrator of development of the required cyberinfrastructure + - [The 200Gbps Challenge: Imagining HL-LHC analysis facilities](https://indico.cern.ch/event/1338689/contributions/6009824/) (Alexander Held, Monday plenary) +* Opportunity for ATLAS to demonstrate columnar analysis views and areas for improvement +] +] +.kol-3-5[ ++ + + +
+ +.center.large[[High level view of operations in an HL-LHC analysis](https://iris-hep.org/grand-challenges.html#analysis-grand-challenge)] +] + +--- +# Pythonic Analysis Ecosystem for HEP + +.kol-2-5[ ++ + + +
+ +.center.huge[Broader "Scientific Python" ecosystem is designed to be interoperable and support [multiple domain levels](https://www.nature.com/articles/s41586-020-2649-2)] +] + +.kol-1-5[ ++ +
+] + +.kol-2-5[ ++ + + +
+ +.center.huge[Interoperable domain hierarchy design continued in ["PyHEP" ecosystem](https://indico.cern.ch/event/1140031/)] +] + +--- +# Prototyping on US ATLAS Analysis Facilities + +.kol-1-3[ +.large[ +* [University of Chicago Analysis Facility](https://af.uchicago.edu/) .bold[provides testing bed] for analysis platform +* Provides support for: + - [.bold[JupyterLab]](https://jupyterlab.readthedocs.io/) as a common interface + - Highly efficient data delivery with [.bold[XCache]](https://slateci.io/XCache/) + - Conversion to columnar formats with [.bold[ServiceX]](https://iris-hep.org/projects/servicex.html) +* Excellent integration exercise between analysis and operations +] +] +.kol-2-3[ ++ + + +
+ +.center.large[Scalable platform for interactive (or noninteractive) analysis] +] + +--- +# ATLAS Open Data + +.kol-1-2[ +.large[ +* .bold[First] release of [ATLAS Run 2 2015 and 2016 open data](https://atlas.cern/Updates/News/Open-Data-Research) in July 2024 +* Using ATLAS open data for AGC + - Open access data allows for use in testing community projects and problems + - Released as PHYSLITE (HL-LHC data format) + - Allows for new students to be able to learn analysis and make contributions quickly +* More on ATLAS Open Data at CHEP 2024: + - [The First Release of ATLAS Open Data for Research](https://indico.cern.ch/event/1338689/contributions/6013332/) (Zach Marshall, Monday plenary) + - [Open Data at ATLAS: Bringing TeV collisions to the World](https://indico.cern.ch/event/1338689/contributions/6011129/) (Giovanni Guerrieri, Monday Track 8) +] +] +.kol-1-2[ + ++ + + +
+.center[([ATLAS News, 2024-07-01](https://atlas.cern/Updates/News/Open-Data-Research))] + ++ + + +
+] + --- # References * [ATLAS Software and Computing HL-LHC Roadmap](https://cds.cern.ch/record/2802918), ATLAS Collaboration, 2022 +* [ATLAS PHYSLITE Content Documentation](https://atlas-physlite-content.web.cern.ch/), ATLAS Collaboration, Accessed 2024 * [Using Legacy ATLAS C++ Calibration Tools in Modern Columnar Analysis Environments](https://indico.cern.ch/event/1330797/contributions/5796636/), Matthias Vigl, [ACAT 2024](https://indico.cern.ch/event/1330797/) * [How the Scientific Python ecosystem helps answering fundamental questions of the Universe](https://cfp.scipy.org/2024/talk/KCXVVR/), Vangelis Kourlitis, Matthew Feickert, and Gordon Watts, [SciPy 2024](https://www.scipy2024.scipy.org/) * [The Columnar Analysis Grand Challenge Demonstrator](https://indico.cern.ch/event/1268248/contributions/5326293/), Gordon Watts, [ATLAS S&C Plenary Afternoon: Demonstrators](https://indico.cern.ch/event/1268248/), 2023-10-04 [ATLAS Internal] * [ATLAS AGC Demonstrator](https://indico.cern.ch/event/1328739/contributions/5605607/), Gordon Watts, [ATLAS AMG+ADC Joint Session](https://indico.cern.ch/event/1328739/), 2023-03-30 [ATLAS Internal] +* [Tour of the CP Columnar Prototype and CP Algorithm Conversion](https://indico.cern.ch/event/1463263/contributions/6161076/), Nils Krumnack, 2024-10-07 [ATLAS Internal]