Towards Differentiable Physics Analysis

at Scale at the LHC and Beyond

.huge.blue[Matthew Feickert]
.huge[(University of Wisconsin-Madison)]

matthew.feickert@cern.ch

SNOLAB Seminar Series

September 18th, 2023

Introduction

Privileged opportunity to work among multiple scientific communities
Care about .bold[reusable] open science to be able to push particle physics forward at the .bold[community scale]
- The challenges of the next decade provide wonderful research environments that will require interdisciplinary knowledge exchange to fully attack
Today we'll share .bold[high level] views of deeply .bold[technical problems] ] ] .kol-1-3[ .center.width-65[]

.center.width-40[]

.center.width-30[]

.center.width-30[] ]

High Energy Physics at the LHC

.kol-1-2.center[

] .kol-1-2.center[ .kol-1-2.center[

] .kol-1-2.center[

] ] ]

Opportunities and Challenges of the HL-LHC

.large[ * Increase in luminosity of roughly order of magnitude - $3$ - $4$ $\mathrm{ab}^{-1}$ (factor of 20-25 from Run-2 delivered) * Boon for measurements constrained by statistical uncertainties, searches for rare processes ]

Opportunities and Challenges of the HL-LHC

.center.large[Challenge to be able to .bold[record, store, and analyze] the data]

] .kol-1-2[

]

.center.large[Projected .bold[required compute usage] for HL-LHC (want R&D below budget line)]

IRIS-HEP

IRIS-HEP

Intellectual Hub
Analysis Systems
Data Organization, Management, and Access (DOMA)
Innovative Algorithms
Translational Research for AI
Scalable Systems Laboratory (SSL)
OSG Services for LHC (OSG-LHC) ] ] .kol-1-2[

.large[ community engagement with .bold[training, education, and outreach] and .bold[institute grand challenges] ]

IRIS-HEP Analysis Systems

Deployable analysis pipelines that reduces physicist time-to-insight
- Tools integrate into the broader scientific Python computing ecosystem
Analysis reuse as deployment feature ]

IRIS-HEP Analysis Systems

Integrating machine learning training and inference into analysis workflows
- c.f. Machine Learning for Columnar High Energy Physics Analysis, Elliott Kauffman, CHEP 2023 ]

Ecosystems

.center.large[ In his PyCon 2017 keynote, Jake VanderPlas gave us the iconic "PyData ecosystem" image ]

PyHEP ecosystem

.center.large[ In his 2022 PyHEP topical meeting update, Jim Pivarski gave us a view for the PyHEP ecosystem ]

Rapid rise of Python for analysis in HEP

.center.large["import XYZ" matches in GitHub repos for users who fork [CMSSW](https://github.com/cms-sw/cmssw) by file]

.footnote[Modern Python analysis ecosystem for High Energy Physics, Jim Pivarski, Matthew Feickert, Gordon Watts]

Explosion of Scientific Python (NumPy, etc.)

.center.large["import XYZ" matches in GitHub repos for users who fork [CMSSW](https://github.com/cms-sw/cmssw) by library/tool]

.footnote[Modern Python analysis ecosystem for High Energy Physics, Jim Pivarski, Matthew Feickert, Gordon Watts]

Community adoption ...

.center.large["pip install XYZ" download rate for MacOS/Windows (no batch jobs) in aggregate]

.footnote[Modern Python analysis ecosystem for High Energy Physics, Jim Pivarski, Matthew Feickert, Gordon Watts]

Community adoption with ecosystem growth

.center.large["pip install XYZ" download rate for MacOS/Windows (no batch jobs) by package] .caption[Aided by interoperable design]

.footnote[Modern Python analysis ecosystem for High Energy Physics, Jim Pivarski, Matthew Feickert, Gordon Watts]

Broader scientific open source collaborations

] .kol-1-3[

] ] .kol-1-3[ .center.huge[[dask-awkward](https://github.com/dask-contrib/dask-awkward)]

.center[Native Dask collection for partioned Awkward arrays for analysis at scale] ] .kol-1-3[ .center.huge[scikit-build-core]

Automatic differentiation as tool for physics

] .kol-1-2.huge[

.bold[New directions in science are launched by new tools much more often than by new concepts.]
— Freeman Dyson ]

Gradients as Computational Tools

As we'll see later, having access to the gradient while performing minimization is highly beneficial!
Can imagine multiple ways of arriving at gradients for computational functions
- But want them to be both .bold[exact] and .bold[flexible]

.center.width-25[] .kol-6-8[ .bold.center[Symbolic] .center.width-100[] ] .kol-2-8.huge[

Exact: .blue[Yes]
Flexible: .red[No] ]

Gradients as Computational Tools

As we'll see later, having access to the gradient while performing minimization is highly beneficial!
Can imagine multiple ways of arriving at gradients for computational functions
- But want them to be both .bold[exact] and .bold[flexible]

.center.width-25[] .kol-6-8[ .bold.center[Numeric] .center.width-70[] ] .kol-2-8.huge[

Exact: .red[No]
Flexible: .blue[Yes] ]

Gradients as Computational Tools

As we'll see later, having access to the gradient while performing minimization is highly beneficial!
Can imagine multiple ways of arriving at gradients for computational functions
- But want them to be both .bold[exact] and .bold[flexible]

.center.width-25[] .kol-6-8[ .bold.center[Automatic] .center.width-80[] ] .kol-2-8.huge[

Exact: .blue[Yes]
Flexible: .blue[Yes] ]

Automatic Differentiation

Automatic differentiation (autodiff) provides gradients of numerical functions to machine precision
Build computational graph of the calculation
Nodes represent operations, edges represent flow of gradients
Apply the chain rule to operations
- Can traverse the graph in forward or reverse modes depending on the relative dimensions of input and output for efficient computation

$$ f(a,b) = a^{2} \sin(ab) $$ $$ \frac{df}{da} = \frac{\partial c}{\partial a} \frac{\partial f}{\partial c} + \frac{\partial d}{\partial a} \frac{\partial e}{\partial d} \frac{\partial f}{\partial e} $$

] .kol-2-5.center[ .width-100[] ]

Differentiable Programming

Allows writing fully differentiable programs that are efficient and accurate
Resulting system can be optimized end-to-end using efficient gradient-based optimization algorithms
- Exploit advances in deep learning
Enables .italic[efficient] computation of gradients and Jacobians
- Large benefit to statistical inference
Replace non-differentiable operations with differentiable analogues
- Binning, sorting, cuts ] .kol-1-2[
  
  .center.width-100[] .center[Snowmass 2021 LOI] ] ]

Case study:
Automatic differentiation improving analyses

.huge.bold.center[Application of automatic differentiation in pyhf]

Goals of physics analysis at the LHC

Make precision measurements ] .kol-1-3.center[ .width-110[[![SUSY-2018-31_limit](figures/SUSY-2018-31_limit.png)](https://atlas.web.cern.ch/Atlas/GROUPS/PHYSICS/PAPERS/SUSY-2018-31/)]

Provide constraints on models through setting best limits ] ]

All require .bold[building statistical models] and .bold[fitting models] to data to perform statistical inference
Model complexity can be huge for complicated searches
Problem: Time to fit can be .bold[many hours]
.blue[Goal:] Empower analysts with fast fits and expressive models

HistFactory Model

A flexible probability density function (p.d.f.) template to build statistical models in high energy physics
Developed in 2011 during work that lead to the Higgs discovery [CERN-OPEN-2012-016]
Widely used by ATLAS for .bold[measurements of known physics] (Standard Model) and .bold[searches for new physics] (beyond the Standard Model)

.kol-2-5.center[ .width-90[] .bold[Standard Model] ] .kol-3-5.center[ .width-100[] .bold[Beyond the Standard Model] ]

HistFactory Template: at a glance

$$ f\left(\mathrm{data}\middle|\mathrm{parameters}\right) = f\left(\textcolor{#00a620}{\vec{n}}, \textcolor{#a3130f}{\vec{a}}\middle|\textcolor{#0495fc}{\vec{\eta}}, \textcolor{#9c2cfc}{\vec{\chi}}\right) = \textcolor{blue}{\prod_{c \,\in\, \textrm{channels}} \prod_{b \,\in\, \textrm{bins}_c} \textrm{Pois} \left(n_{cb} \middle| \nu_{cb}\left(\vec{\eta}, \vec{\chi}\right)\right)} \,\textcolor{red}{\prod_{\chi \,\in\, \vec{\chi}} c_{\chi} \left(a_{\chi}\middle|\chi\right)} $$

.center[$\textcolor{#00a620}{\vec{n}}$: .obsdata[events], $\textcolor{#a3130f}{\vec{a}}$: .auxdata[auxiliary data], $\textcolor{#0495fc}{\vec{\eta}}$: .freepars[unconstrained pars], $\textcolor{#9c2cfc}{\vec{\chi}}$: .conpars[constrained pars]]

$$ \nu_{cb}(\textcolor{#0495fc}{\vec{\eta}}, \textcolor{#9c2cfc}{\vec{\chi}}) = \sum_{s \,\in\, \textrm{samples}} \underbrace{\left(\sum_{\kappa \,\in\, \vec{\kappa}} \kappa_{scb}(\textcolor{#0495fc}{\vec{\eta}}, \textcolor{#9c2cfc}{\vec{\chi}})\right)}_{\textrm{multiplicative}} \Bigg(\nu_{scb}^{0}(\textcolor{#0495fc}{\vec{\eta}}, \textcolor{#9c2cfc}{\vec{\chi}}) + \underbrace{\sum_{\Delta \,\in\, \vec{\Delta}} \Delta_{scb}(\textcolor{#0495fc}{\vec{\eta}}, \textcolor{#9c2cfc}{\vec{\chi}})}_{\textrm{additive}}\Bigg) $$

.bold[Use:] Multiple disjoint channels (or regions) of binned distributions with multiple samples contributing to each with additional (possibly shared) systematics between sample estimates

.blue[Main Poisson p.d.f. for simultaneous measurement of multiple channels]
.katex[Event rates] $\nu_{cb}(\textcolor{#0495fc}{\vec{\eta}}, \textcolor{#9c2cfc}{\vec{\chi}})$ (nominal rate $\nu_{scb}^{0}$ with rate modifiers)
- encode systematic uncertainties (e.g. normalization, shape)
.red[Constraint p.d.f. (+ data) for "auxiliary measurements"]

HistFactory Template: at a second glance

$$ f\left(\mathrm{data}\middle|\mathrm{parameters}\right) = f\left(\textcolor{#00a620}{\vec{n}}, \textcolor{#a3130f}{\vec{a}}\middle|\textcolor{#0495fc}{\vec{\eta}}, \textcolor{#9c2cfc}{\vec{\chi}}\right) = \prod_{c \,\in\, \textrm{channels}} \prod_{b \,\in\, \textrm{bins}_c} \textrm{Pois} \left(\textcolor{#00a620}{n_{cb}} \middle| \nu_{cb}\left(\textcolor{#0495fc}{\vec{\eta}}, \textcolor{#9c2cfc}{\vec{\chi}}\right)\right) \,\prod_{\chi \,\in\, \vec{\chi}} c_{\chi} \left(\textcolor{#a3130f}{a_{\chi}}\middle|\textcolor{#9c2cfc}{\chi}\right) $$

.center[$\textcolor{#00a620}{\vec{n}}$: .obsdata[events], $\textcolor{#a3130f}{\vec{a}}$: .auxdata[auxiliary data], $\textcolor{#0495fc}{\vec{\eta}}$: .freepars[unconstrained pars], $\textcolor{#9c2cfc}{\vec{\chi}}$: .conpars[constrained pars]]

$$ \nu_{cb}(\textcolor{#0495fc}{\vec{\eta}}, \textcolor{#9c2cfc}{\vec{\chi}}) = \sum_{s \,\in\, \textrm{samples}} \underbrace{\left(\sum_{\kappa \,\in\, \vec{\kappa}} \kappa_{scb}(\textcolor{#0495fc}{\vec{\eta}}, \textcolor{#9c2cfc}{\vec{\chi}})\right)}_{\textrm{multiplicative}} \Bigg(\nu_{scb}^{0}(\textcolor{#0495fc}{\vec{\eta}}, \textcolor{#9c2cfc}{\vec{\chi}}) + \underbrace{\sum_{\Delta \,\in\, \vec{\Delta}} \Delta_{scb}(\textcolor{#0495fc}{\vec{\eta}}, \textcolor{#9c2cfc}{\vec{\chi}})}_{\textrm{additive}}\Bigg) $$

.bold[Use:] Multiple disjoint channels (or regions) of binned distributions with multiple samples contributing to each with additional (possibly shared) systematics between sample estimates

.blue[Main Poisson p.d.f. for simultaneous measurement of multiple channels]
.katex[Event rates] $\nu_{cb}(\textcolor{#0495fc}{\vec{\eta}}, \textcolor{#9c2cfc}{\vec{\chi}})$ (nominal rate $\nu_{scb}^{0}$ with rate modifiers)
- encode systematic uncertainties (e.g. normalization, shape)
.red[Constraint p.d.f. (+ data) for "auxiliary measurements"]

HistFactory Template: grammar

$$ f\left(\mathrm{data}\middle|\mathrm{parameters}\right) = f\left(\textcolor{#00a620}{\vec{n}}, \textcolor{#a3130f}{\vec{a}}\middle|\textcolor{#0495fc}{\vec{\eta}}, \textcolor{#9c2cfc}{\vec{\chi}}\right) = \textcolor{blue}{\prod_{c \,\in\, \textrm{channels}} \prod_{b \,\in\, \textrm{bins}_c} \textrm{Pois} \left(n_{cb} \middle| \nu_{cb}\left(\vec{\eta}, \vec{\chi}\right)\right)} \,\textcolor{red}{\prod_{\chi \,\in\, \vec{\chi}} c_{\chi} \left(a_{\chi}\middle|\chi\right)} $$

Mathematical grammar for a simultaneous fit with:

.blue[multiple "channels"] (analysis regions, (stacks of) histograms) that can have multiple bins
with systematic uncertainties that modify the event rate $\nu_{cb}(\textcolor{#0495fc}{\vec{\eta}}, \textcolor{#9c2cfc}{\vec{\chi}})$
coupled to a set of .red[constraint terms]

.center.width-40[] .center[Example: .bold[Each bin] is separate (1-bin) channel, each .bold[histogram] (color)
is a sample and share a .bold[normalization systematic] uncertainty]

HistFactory Template: implementation

$$ f\left(\mathrm{data}\middle|\mathrm{parameters}\right) = f\left(\textcolor{#00a620}{\vec{n}}, \textcolor{#a3130f}{\vec{a}}\middle|\textcolor{#0495fc}{\vec{\eta}}, \textcolor{#9c2cfc}{\vec{\chi}}\right) = \prod_{c \,\in\, \textrm{channels}} \prod_{b \,\in\, \textrm{bins}_c} \textrm{Pois} \left(\textcolor{#00a620}{n_{cb}} \middle| \nu_{cb}\left(\textcolor{#0495fc}{\vec{\eta}}, \textcolor{#9c2cfc}{\vec{\chi}}\right)\right) \,\prod_{\chi \,\in\, \vec{\chi}} c_{\chi} \left(\textcolor{#a3130f}{a_{\chi}}\middle|\textcolor{#9c2cfc}{\chi}\right) $$

.center[$\textcolor{#00a620}{\vec{n}}$: .obsdata[events], $\textcolor{#a3130f}{\vec{a}}$: .auxdata[auxiliary data], $\textcolor{#0495fc}{\vec{\eta}}$: .freepars[unconstrained pars], $\textcolor{#9c2cfc}{\vec{\chi}}$: .conpars[constrained pars]]

$$ \nu_{cb}(\textcolor{#0495fc}{\vec{\eta}}, \textcolor{#9c2cfc}{\vec{\chi}}) = \sum_{s \,\in\, \textrm{samples}} \underbrace{\left(\sum_{\kappa \,\in\, \vec{\kappa}} \kappa_{scb}(\textcolor{#0495fc}{\vec{\eta}}, \textcolor{#9c2cfc}{\vec{\chi}})\right)}_{\textrm{multiplicative}} \Bigg(\nu_{scb}^{0}(\textcolor{#0495fc}{\vec{\eta}}, \textcolor{#9c2cfc}{\vec{\chi}}) + \underbrace{\sum_{\Delta \,\in\, \vec{\Delta}} \Delta_{scb}(\textcolor{#0495fc}{\vec{\eta}}, \textcolor{#9c2cfc}{\vec{\chi}})}_{\textrm{additive}}\Bigg) $$

.center[.bold[This is a mathematical representation!] Nowhere is any software spec defined] .center[.bold[Until 2018] the only implementation of HistFactory has been in ROOT]

.center.width-70[]

`pyhf`: HistFactory in pure Python

.kol-1-2.large[

First non-ROOT implementation of the HistFactory p.d.f. template
- .width-40[]
pure-Python library as second implementation of HistFactory
- $ python -m pip install pyhf
- No dependence on ROOT!

.center.width-100[] ] .kol-1-2.large[

Open source tool for all of HEP
- IRIS-HEP supported Scikit-HEP project
- Used in ATLAS SUSY, Exotics, and Top groups in 25 published analyses (inference and published models)
- Used by Belle II
  (DOI: 10.1103/PhysRevLett.127.181802) and MicroBooNE (upcoming results)
- Used in analyses and for reinterpretation by phenomenology community, SModelS
  (DOI: 10.1016/j.cpc.2021.107909), and MadAnalysis 5 (arXiv:2206.14870)
- Maybe your experiment too! ]

Machine Learning Frameworks for Computation

All numerical operations implemented in .bold[tensor backends] through an API of $n$-dimensional array operations
Using deep learning frameworks as computational backends allows for .bold[exploitation of auto differentiation (autodiff) and GPU acceleration]
As huge buy in from industry we benefit for free as these frameworks are .bold[continually improved] by professional software engineers (physicists are not)

.kol-1-2.center[ .width-80[] ] .kol-1-2[

Hardware acceleration giving .bold[order of magnitude speedup] in interpolation for systematics!
- does suffer some overhead
Noticeable impact for large and complex models
- hours to minutes for fits ] ] .kol-1-4.center[ .width-85[] .width-85[] .width-85[]

Automatic differentiation

With tensor library backends gain access to exact (higher order) derivatives — accuracy is only limited by floating point precision

$$ \frac{\partial L}{\partial \mu}, \frac{\partial L}{\partial \theta_{i}} $$

.grid[ .kol-1-2[ .large[Exploit .bold[full gradient of the likelihood] with .bold[modern optimizers] to help speedup fit!]

.large[Gain this through the frameworks creating computational directed acyclic graphs and then applying the chain rule (to the operations)] ] .kol-1-2[ .center.width-80[] ] ]

HEP Example: Likelihood Gradients

.kol-1-2.center[

] .kol-1-2.center[

]

.bold.center[Having access to the gradients can make the fit orders of magnitude faster than finite difference]

HEP Example: Likelihood Gradients

.kol-1-2.center[

] .kol-1-2.center[

]

.bold.center[Having access to the gradients can make the fit orders of magnitude faster than finite difference]

Enable new techniques with autodiff

.huge.bold.center[Familiar (toy) example: Optimizing selection "cut" for an analysis]

Discriminate Signal and Background

Counting experiment for presence of signal process
Place discriminate selection cut on observable $x$ to maximize significance
- Significance: $\sqrt{2 (S+B) \log(1 + \frac{S}{B})-2S}$ (for small $S/B$: significance $\to S/\sqrt{B}$)

.kol-1-2.center[

] .kol-1-2.center[

]

Traditionally: Scan across cut values

Set baseline cut at $x=0$ (accept everything)
Step along cut values in $x$ and calculate significance at each cut. Keep maximum.

.kol-1-2.center[ .width-100[] ] .kol-1-2[ .width-100[] ]

Differentiable Approach

.kol-1-2.large[

Need differentiable analogue to non-differentiable cut
Weight events using activation function of sigmoid

Event far .italic[below] cut: $w \to 0$
Event far .italic[above] cut: $w \to 1$
$\alpha$ tunable parameter for steepness
- Larger $\alpha$ more cut-like ] .kol-1-2[

Compare Hard Cuts vs. Differentiable

.kol-1-2.large[

For hard cuts the significance was calculated by applying the cut and than using the remaining $S$ and $B$ events
But for the differentiable model there aren't cuts, so approximate cuts with the sigmoid approach and weights
Comparing the two methods shows good agreement
Can see that the approximation to the hard cuts improves with larger $\alpha$
- But can become unstable, so tunable ] .kol-1-2.center[

Compare Hard Cuts vs. Differentiable

.kol-1-2.large[

For hard cuts the significance was calculated by applying the cut and then using the remaining $S$ and $B$ events
But for the differentiable model there aren't cuts, so approximate cuts with the sigmoid approach and weights
Comparing the two methods shows good agreement
Can see that the approximation to the hard cuts improves with larger $\alpha$
- But can become unstable, so tunable ] .kol-1-2.center[

.width-100[![significance_scan_compare_high_alpha](figures/significance_scan_compare_high_alpha.png)] ]

Accessing the Gradient

.kol-2-5.large[

Most importantly though, with the differentiable model we have access to the gradient
- $\partial_{x} f(x)$
So can find the maximum significance at the point where the gradient of the significance is zero
- $\partial_{x} f(x) = 0$
With the gradient in hand this cries out for automated optimization! ] .kol-3-5.center[

]

Automated Optimzation

.kol-2-5.large[

With a simple gradient descent algorithm can easily automate the significance optimization
For this toy example, obviously less efficient then cut and count scan
Gradient methods apply well in higher dimensional problems
Allows for the "cut" to become a parameter that can be differentiated through for the larger analysis ] .kol-3-5.center[ .width-100[]

]

New Art: Analysis as a Differentiable Program

Provide differentiable analogue to histograms with kernel density estimation (KDE) or softmax
- Need smooth change compared to abrupt changes in binned yields

Samples fed into NN that produces observable (NN output) KDE transformed and histogrammed.
Construct pyhf model with observable and perform inference to get $\mathrm{CL}_{s}$ for POI.
Backpropagate the $\mathrm{CL}_{s}$ to update weights for NN.

.center.width-40[[![neos_logo](https://raw.githubusercontent.com/gradhep/neos/master/nbs/assets/neos_logo.png)](https://github.com/gradhep/neos)] .footnote[Graphics from [Nathan Simpson's PyHEP 2020 talk](https://indico.cern.ch/event/882824/timetable/#46-neos-physics-analysis-as-a)] ] .kol-1-2.center[ .width-40[[![neoflow](figures/kde_bins.gif)](https://indico.cern.ch/event/882824/timetable/#46-neos-physics-analysis-as-a)] .width-100[[![neoflow](figures/neoflow.png)](https://indico.cern.ch/event/882824/timetable/#46-neos-physics-analysis-as-a)] ]

New Art: Analysis as a Differentiable Program

.center[neos 3 bin KDE transformed observable (NN output) optimized with systematics w.r.t. $\mathrm{CL}_{s}$] .center.width-100[]

.neos-orange[Background] and .neos-blue[signal] samples
- Same colors for dist. / hist.
3 decision regions are mappings of NN output
- $[0.67, 1.0]$ bin $\to$ top left region ] .kol-1-3[
From KDE of NN output form pyhf model with 1 channel with 2 samples and 3 bins
$\mathrm{CL}_{s}$ value minimized as goal of NN ] .kol-1-3[
Observations in NN output
- $0$: Background-like
- $1$: Signal-like
Binned contents channel input for pyhf model ]

Scalable solutions

.huge.bold.center[Differentiable analyses at LHC scale]

Scaling is reasonable

From the 2023 MIAPbP Workshop on on Differentiable and Probabilistic Programming for physics engagement with the broader community showed multiple large scale workflows

.center[.bold[If] things are differentiable, shouldn't be scared of .bold[large-scale codebases and applications]]

] .kol-1-2[

.center[[Nicolas Gauger, MIAPbP Workshop 2023](https://indico.ph.tum.de/event/7314/contributions/7432/)] ]

Gradient Passing

.kol-2-5.code-large[

Real world high energy physics analyses have various challenges:
- Computations highly complex chains
- Not implementable in a single framework
- Asynchronous multi-step procedures
- Strong need for distributed computing
Passing of gradients .bold[between] different implementations and services
- Large scale machine learning in industry needs to do this to train models
Possible solution to allow for distributed computations at scale exploiting gradients ] .kol-3-5.center[

.width-100[[![metadiff](figures/metadiff.png)](https://indico.cern.ch/event/960587/contributions/4070325/)] .caption[[Differentiating through PyTorch, JAX, and TensorFlow using FaaS](https://indico.cern.ch/event/960587/contributions/4070325/), Lukas Heinrich] ]

Scaling and Analysis Reuse

.center[Revisiting IRIS-HEP Analysis Systems in the context of distributed scaling and analysis reuse]

Analysis Reuse

Data and analyses done at the LHC are unique physics opportunities
RECAST has been implemented in ATLAS as an enabling technology
Resulting in ATLAS PUB notes extending the physics reach of original publications ]

ML + reinterpretation: Active learning

.kol-1-2[ .huge[ Leveraging REANA reproducible research data analysis platform possible to run distributed ML and analysis workflows at scale ]

.caption[[ Christian Weber, Reinterpretation Forum 2023](https://conference.ippp.dur.ac.uk/event/1178/contributions/6449/)] ] .kol-1-2[

.caption[[ATL-PHYS-PUB-2023-010](https://atlas.web.cern.ch/Atlas/GROUPS/PHYSICS/PUBNOTES/ATL-PHYS-PUB-2023-010/)] ]

Applications beyond HEP

General techniques and technologies applied to HEP problems, but not constrained to them
- Automatic differentiation is a rich field of research unto itself
Engagement with the broader scientific open source community
Planning for analysis reuse brings flexibility to leverage tooling ]

Summary

Many challenges and opportunities ahead at the HL-LHC
Engaging the broader scientific open source community has been a boon for particle physics tooling
Automatic differentiation gives a powerful tool in the form of differentiable programming
Scalable and reusable analysis workflows allow leveraging our tools ]

Opportunities and Challenges of the HL-LHC

.center.large[Challenge to be able to .bold[record, store, and analyze] the data]

] .kol-1-2[

]

.center.large[Projected .bold[required disk usage] for HL-LHC (want R&D below budget line)]

Automatic Differentiation: Forward and Reverse

.center[Performing maps $f: \mathbb{R}^{m} \to \mathbb{R}^{n}$]
.center[aka, "wide" vs. "tall" transformations]
.kol-1-2[

.bold[Forward] mode
Column wise evaluation of Jacobian
- Jacobian-vector products
- Execution time scales with input parameters
- Example: few variables into very high dimensional spaces $\mathbb{R} \to \mathbb{R}^{100}$ ] .kol-1-2[
.bold[Reverse] mode
Row wise evaluation of Jacobian
- vector-Jacobian products
- Execution time scales with output parameters
- Example: scalar maps from very high-dimensional spaces $\mathbb{R}^{100} \to \mathbb{R}$ ]

HistFactory Template

$$ f\left(\mathrm{data}\middle|\mathrm{parameters}\right) = f\left(\vec{n}, \vec{a}\middle|\vec{\eta}, \vec{\chi}\right) = \color{blue}{\prod_{c \,\in\, \textrm{channels}} \prod_{b \,\in\, \textrm{bins}_c} \textrm{Pois} \left(n_{cb} \middle| \nu_{cb}\left(\vec{\eta}, \vec{\chi}\right)\right)} \,\color{red}{\prod_{\chi \,\in\, \vec{\chi}} c_{\chi} \left(a_{\chi}\middle|\chi\right)} $$

.bold[Use:] Multiple disjoint channels (or regions) of binned distributions with multiple samples contributing to each with additional (possibly shared) systematics between sample estimates

.blue[Main Poisson p.d.f. for simultaneous measurement of multiple channels]
.katex[Event rates] $\nu_{cb}$ (nominal rate $\nu_{scb}^{0}$ with rate modifiers)
.red[Constraint p.d.f. (+ data) for "auxiliary measurements"]
- encode systematic uncertainties (e.g. normalization, shape)
$\vec{n}$: events, $\vec{a}$: auxiliary data, $\vec{\eta}$: unconstrained pars, $\vec{\chi}$: constrained pars ] .kol-1-2[ .center.width-100[] .center[Example: .bold[Each bin] is separate (1-bin) channel,
each .bold[histogram] (color) is a sample and share
a .bold[normalization systematic] uncertainty] ]

HistFactory Template

$$ f\left(\vec{n}, \vec{a}\middle|\vec{\eta}, \vec{\chi}\right) = \color{blue}{\prod_{c \,\in\, \textrm{channels}} \prod_{b \,\in\, \textrm{bins}_c} \textrm{Pois} \left(n_{cb} \middle| \nu_{cb}\left(\vec{\eta}, \vec{\chi}\right)\right)} \,\color{red}{\prod_{\chi \,\in\, \vec{\chi}} c_{\chi} \left(a_{\chi}\middle|\chi\right)} $$

Mathematical grammar for a simultaneous fit with

.blue[multiple "channels"] (analysis regions, (stacks of) histograms)
each region can have .blue[multiple bins]
coupled to a set of .red[constraint terms]

.center[.bold[This is a mathematical representation!] Nowhere is any software spec defined] .center[.bold[Until recently] (2018), the only implementation of HistFactory has been in ROOT]

HistFactory Template: systematic uncertainties

In HEP common for systematic uncertainties to be specified with two template histograms: "up" and "down" variation for parameter $\theta \in \{\textcolor{#0495fc}{\vec{\eta}}, \textcolor{#9c2cfc}{\vec{\chi}} \}$
- "up" variation: model prediction for $\theta = +1$
- "down" variation: model prediction for $\theta = -1$
- Interpolation and extrapolation choices provide .bold[model predictions $\nu(\vec{\theta},)$ for any $\vec{\theta}$]

Constraint terms $c_{j} \left(\textcolor{#a3130f}{a_{j}}\middle|\textcolor{#9c2cfc}{\theta_{j}}\right)$ used to model auxiliary measurements. Example for Normal (most common case):
- Mean of nuisance parameter $\textcolor{#9c2cfc}{\theta_{j}}$ with normalized width ($\sigma=1$)
- Normal: auxiliary data $\textcolor{#a3130f}{a_{j} = 0}$ (aux data function of modifier type)
- Constraint term produces penalty in likelihood for pulling $\textcolor{#9c2cfc}{\theta_{j}}$ away from auxiliary measurement value
- As $\nu(\vec{\theta},)$ constraint terms inform rate modifiers (.bold[systematic uncertainties]) during simultaneous fit
- Example: Correlated shape histosys modifier could represent part of the uncertainty associated with a jet energy scale ] .kol-3-7[ .center.width-70[] .center[Image credit: Alex Held] ]

What is `pyhf`?

Please checkout the many resources we have starting with the website and the SciPy 2020 talk!

.grid[ .kol-1-3.center[ .width-60[[![scikit-hep_logo](https://scikit-hep.org/assets/images/logo.png)](https://scikit-hep.org/)] ] .kol-1-3.center[
.width-60[[![pyhf_logo](https://iris-hep.org/assets/logos/pyhf-logo.png)](https://github.com/scikit-hep/pyhf)] ] .kol-1-3.center[
.width-70[[![iris-hep_logo](assets/logos/logo_IRIS-HEP.png)](https://iris-hep.org/)] ] ]

Differentiable Ecosystem

.kol-1-3.center[ .width-100[]

gradhep ] .kol-1-3.center[ .width-100[]

neos, INFERNO ] .kol-1-3.center[

.width-100[]

ACTS ]

References

Lukas Heinrich, .italic[Distributed Gradients for Differentiable Analysis], Future Analysis Systems and Facilities Workshop, 2020.
Jim Pivarski, .italic[History and Adoption of Programming Languages in NHEP], Software & Computing Round Table, 2022.

The end.

Files

talk.md

Latest commit

History

talk.md

File metadata and controls

Towards Differentiable Physics Analysis

at Scale at the LHC and Beyond

Introduction

High Energy Physics at the LHC

Opportunities and Challenges of the HL-LHC

Opportunities and Challenges of the HL-LHC

IRIS-HEP

IRIS-HEP

IRIS-HEP Analysis Systems

IRIS-HEP Analysis Systems

Ecosystems

PyHEP ecosystem

Rapid rise of Python for analysis in HEP

Explosion of Scientific Python (NumPy, etc.)

Community adoption ...

Community adoption with ecosystem growth

Broader scientific open source collaborations

Automatic differentiation as tool for physics

Gradients as Computational Tools

Gradients as Computational Tools

Gradients as Computational Tools

Automatic Differentiation

Differentiable Programming

Case study: Automatic differentiation improving analyses

Goals of physics analysis at the LHC

HistFactory Model

HistFactory Template: at a glance

HistFactory Template: at a second glance

HistFactory Template: grammar

HistFactory Template: implementation

pyhf: HistFactory in pure Python

Machine Learning Frameworks for Computation

Automatic differentiation

HEP Example: Likelihood Gradients

HEP Example: Likelihood Gradients

Enable new techniques with autodiff

Discriminate Signal and Background

Traditionally: Scan across cut values

Differentiable Approach

Compare Hard Cuts vs. Differentiable

Compare Hard Cuts vs. Differentiable

Accessing the Gradient

Automated Optimzation

New Art: Analysis as a Differentiable Program

New Art: Analysis as a Differentiable Program

Scalable solutions

Scaling is reasonable

Gradient Passing

Scaling and Analysis Reuse

Analysis Reuse

ML + reinterpretation: Active learning

Applications beyond HEP

Summary

Opportunities and Challenges of the HL-LHC

Automatic Differentiation: Forward and Reverse

HistFactory Template

HistFactory Template

HistFactory Template: systematic uncertainties

What is pyhf?

Differentiable Ecosystem

References

Case study:
Automatic differentiation improving analyses

`pyhf`: HistFactory in pure Python

What is `pyhf`?