Skip to content

Latest commit

 

History

History
1164 lines (959 loc) · 43.8 KB

talk.md

File metadata and controls

1164 lines (959 loc) · 43.8 KB

class: middle, center, title-slide count: false

Towards Differentiable Physics Analysis

at Scale at the LHC and Beyond

.huge.blue[Matthew Feickert]
.huge[(University of Wisconsin-Madison)]

[email protected]

SNOLAB Seminar Series

September 18th, 2023


Introduction

.kol-2-3[ .huge[

  • Privileged opportunity to work among multiple scientific communities
  • Care about .bold[reusable] open science to be able to push particle physics forward at the .bold[community scale]
    • The challenges of the next decade provide wonderful research environments that will require interdisciplinary knowledge exchange to fully attack
  • Today we'll share .bold[high level] views of deeply .bold[technical problems] ] ] .kol-1-3[ .center.width-65[logo_IRIS-HEP]

.center.width-40[logo_ATLAS]

.center.width-40[logo_IRIS-HEP]

.center.width-30[logo_Scikit-HEP]

.center.width-30[logo_joss] ]


High Energy Physics at the LHC

.kol-1-2.center[

.caption[LHC] ] .kol-1-2.center[

.caption[ATLAS] ] .kol-1-1[ .kol-1-2.center[

] .kol-1-2.center[ .kol-1-2.center[

] .kol-1-2.center[

] ] ]

Opportunities and Challenges of the HL-LHC

.large[ * Increase in luminosity of roughly order of magnitude - $3$ - $4$ $\mathrm{ab}^{-1}$ (factor of 20-25 from Run-2 delivered) * Boon for measurements constrained by statistical uncertainties, searches for rare processes ]

Opportunities and Challenges of the HL-LHC

.center.large[Challenge to be able to .bold[record, store, and analyze] the data]

.kol-1-2[

] .kol-1-2[

]

.center.large[Projected .bold[required compute usage] for HL-LHC (want R&D below budget line)]

.center[ATLAS and CMS software and computing reviews]


IRIS-HEP

.kol-1-2[

.huge[ * LHC experiments as stakeholders * LHC operations as partners ] ] .kol-1-2[

.caption[Institute for Research and Innovation in Software for High Energy Physics (IRIS-HEP)] ]

IRIS-HEP

.kol-1-1[ .kol-1-2[ .huge[ Designed around focus areas ] .large[

  • Intellectual Hub
  • Analysis Systems
  • Data Organization, Management, and Access (DOMA)
  • Innovative Algorithms
  • Translational Research for AI
  • Scalable Systems Laboratory (SSL)
  • OSG Services for LHC (OSG-LHC) ] ] .kol-1-2[

.caption[IRIS-HEP Institute Structure] ] ]

.large[ community engagement with .bold[training, education, and outreach] and .bold[institute grand challenges] ]


IRIS-HEP Analysis Systems

.huge[

  • Deployable analysis pipelines that reduces physicist time-to-insight
    • Tools integrate into the broader scientific Python computing ecosystem
  • Analysis reuse as deployment feature ]

IRIS-HEP Analysis Systems

.huge[


Ecosystems

.center.large[ In his PyCon 2017 keynote, Jake VanderPlas gave us the iconic "PyData ecosystem" image ]


PyHEP ecosystem

.center.large[ In his 2022 PyHEP topical meeting update, Jim Pivarski gave us a view for the PyHEP ecosystem ]


Rapid rise of Python for analysis in HEP

.center.large["import XYZ" matches in GitHub repos for users who fork [CMSSW](https://github.com/cms-sw/cmssw) by file]

.footnote[Modern Python analysis ecosystem for High Energy Physics, Jim Pivarski, Matthew Feickert, Gordon Watts]


Explosion of Scientific Python (NumPy, etc.)

.center.large["import XYZ" matches in GitHub repos for users who fork [CMSSW](https://github.com/cms-sw/cmssw) by library/tool]

.footnote[Modern Python analysis ecosystem for High Energy Physics, Jim Pivarski, Matthew Feickert, Gordon Watts]


Community adoption ...

.center.large["pip install XYZ" download rate for MacOS/Windows (no batch jobs) in aggregate]

.footnote[Modern Python analysis ecosystem for High Energy Physics, Jim Pivarski, Matthew Feickert, Gordon Watts]


Community adoption with ecosystem growth

.center.large["pip install XYZ" download rate for MacOS/Windows (no batch jobs) by package] .caption[Aided by interoperable design]

.footnote[Modern Python analysis ecosystem for High Energy Physics, Jim Pivarski, Matthew Feickert, Gordon Watts]


Broader scientific open source collaborations


.kol-1-1[ .kol-1-3[

] .kol-1-3[

] .kol-1-3[

] ] .kol-1-3[ .center.huge[[dask-awkward](https://github.com/dask-contrib/dask-awkward)]

.center[Native Dask collection for partioned Awkward arrays for analysis at scale] ] .kol-1-3[ .center.huge[scikit-build-core]

.center[Next generation of build tools for scientific packaging] ] .kol-1-3[ .center.huge[NumFOCUS]

.center[Organizing and supporting scientific open source] ]


Automatic differentiation as tool for physics

.footnote[Taking a slide from Lukas Heinrich]

.kol-1-2[

] .kol-1-2.huge[

.bold[New directions in science are launched by new tools much more often than by new concepts.]
— Freeman Dyson ]

Gradients as Computational Tools

  • As we'll see later, having access to the gradient while performing minimization is highly beneficial!
  • Can imagine multiple ways of arriving at gradients for computational functions
    • But want them to be both .bold[exact] and .bold[flexible]

.center.width-25[carbon_f_x] .kol-6-8[ .bold.center[Symbolic] .center.width-100[carbon_fprime_symbolic] ] .kol-2-8.huge[

  • Exact: .blue[Yes]
  • Flexible: .red[No] ]

Gradients as Computational Tools

  • As we'll see later, having access to the gradient while performing minimization is highly beneficial!
  • Can imagine multiple ways of arriving at gradients for computational functions
    • But want them to be both .bold[exact] and .bold[flexible]

.center.width-25[carbon_f_x] .kol-6-8[ .bold.center[Numeric] .center.width-70[carbon_fprime_numeric] ] .kol-2-8.huge[

  • Exact: .red[No]
  • Flexible: .blue[Yes] ]

Gradients as Computational Tools

  • As we'll see later, having access to the gradient while performing minimization is highly beneficial!
  • Can imagine multiple ways of arriving at gradients for computational functions
    • But want them to be both .bold[exact] and .bold[flexible]

.center.width-25[carbon_f_x] .kol-6-8[ .bold.center[Automatic] .center.width-80[carbon_fprime_automatic] ] .kol-2-8.huge[

  • Exact: .blue[Yes]
  • Flexible: .blue[Yes] ]

Automatic Differentiation

.kol-3-5[

  • Automatic differentiation (autodiff) provides gradients of numerical functions to machine precision
  • Build computational graph of the calculation
  • Nodes represent operations, edges represent flow of gradients
  • Apply the chain rule to operations
    • Can traverse the graph in forward or reverse modes depending on the relative dimensions of input and output for efficient computation

$$ f(a,b) = a^{2} \sin(ab) $$ $$ \frac{df}{da} = \frac{\partial c}{\partial a} \frac{\partial f}{\partial c} + \frac{\partial d}{\partial a} \frac{\partial e}{\partial d} \frac{\partial f}{\partial e} $$

] .kol-2-5.center[ .width-100[autodiff_graph] ]


Differentiable Programming

.grid[ .kol-1-2.large[

  • Allows writing fully differentiable programs that are efficient and accurate
  • Resulting system can be optimized end-to-end using efficient gradient-based optimization algorithms
    • Exploit advances in deep learning
  • Enables .italic[efficient] computation of gradients and Jacobians
    • Large benefit to statistical inference
  • Replace non-differentiable operations with differentiable analogues
    • Binning, sorting, cuts ] .kol-1-2[

      .center.width-100[Snowmass_LOI] .center[Snowmass 2021 LOI] ] ]

class: focus-slide, center

Case study:
Automatic differentiation improving analyses

.huge.bold.center[Application of automatic differentiation in pyhf]


Goals of physics analysis at the LHC

.kol-1-1[ .kol-1-3.center[ .width-100[ATLAS_Higgs_discovery] Search for new physics ] .kol-1-3.center[
.width-100[CMS-PAS-HIG-19-004]


Make precision measurements ] .kol-1-3.center[ .width-110[[![SUSY-2018-31_limit](figures/SUSY-2018-31_limit.png)](https://atlas.web.cern.ch/Atlas/GROUPS/PHYSICS/PAPERS/SUSY-2018-31/)]

Provide constraints on models through setting best limits ] ]

  • All require .bold[building statistical models] and .bold[fitting models] to data to perform statistical inference
  • Model complexity can be huge for complicated searches
  • Problem: Time to fit can be .bold[many hours]
  • .blue[Goal:] Empower analysts with fast fits and expressive models

HistFactory Model

  • A flexible probability density function (p.d.f.) template to build statistical models in high energy physics
  • Developed in 2011 during work that lead to the Higgs discovery [CERN-OPEN-2012-016]
  • Widely used by ATLAS for .bold[measurements of known physics] (Standard Model) and .bold[searches for new physics] (beyond the Standard Model)

.kol-2-5.center[ .width-90[HIGG-2016-25] .bold[Standard Model] ] .kol-3-5.center[ .width-100[SUSY-2016-16] .bold[Beyond the Standard Model] ]


HistFactory Template: at a glance

$$ f\left(\mathrm{data}\middle|\mathrm{parameters}\right) = f\left(\textcolor{#00a620}{\vec{n}}, \textcolor{#a3130f}{\vec{a}}\middle|\textcolor{#0495fc}{\vec{\eta}}, \textcolor{#9c2cfc}{\vec{\chi}}\right) = \textcolor{blue}{\prod_{c \,\in\, \textrm{channels}} \prod_{b \,\in\, \textrm{bins}_c} \textrm{Pois} \left(n_{cb} \middle| \nu_{cb}\left(\vec{\eta}, \vec{\chi}\right)\right)} \,\textcolor{red}{\prod_{\chi \,\in\, \vec{\chi}} c_{\chi} \left(a_{\chi}\middle|\chi\right)} $$

.center[$\textcolor{#00a620}{\vec{n}}$: .obsdata[events], $\textcolor{#a3130f}{\vec{a}}$: .auxdata[auxiliary data], $\textcolor{#0495fc}{\vec{\eta}}$: .freepars[unconstrained pars], $\textcolor{#9c2cfc}{\vec{\chi}}$: .conpars[constrained pars]]

$$ \nu_{cb}(\textcolor{#0495fc}{\vec{\eta}}, \textcolor{#9c2cfc}{\vec{\chi}}) = \sum_{s \,\in\, \textrm{samples}} \underbrace{\left(\sum_{\kappa \,\in\, \vec{\kappa}} \kappa_{scb}(\textcolor{#0495fc}{\vec{\eta}}, \textcolor{#9c2cfc}{\vec{\chi}})\right)}_{\textrm{multiplicative}} \Bigg(\nu_{scb}^{0}(\textcolor{#0495fc}{\vec{\eta}}, \textcolor{#9c2cfc}{\vec{\chi}}) + \underbrace{\sum_{\Delta \,\in\, \vec{\Delta}} \Delta_{scb}(\textcolor{#0495fc}{\vec{\eta}}, \textcolor{#9c2cfc}{\vec{\chi}})}_{\textrm{additive}}\Bigg) $$

.bold[Use:] Multiple disjoint channels (or regions) of binned distributions with multiple samples contributing to each with additional (possibly shared) systematics between sample estimates

.bold[Main pieces:]

  • .blue[Main Poisson p.d.f. for simultaneous measurement of multiple channels]
  • .katex[Event rates] $\nu_{cb}(\textcolor{#0495fc}{\vec{\eta}}, \textcolor{#9c2cfc}{\vec{\chi}})$ (nominal rate $\nu_{scb}^{0}$ with rate modifiers)
    • encode systematic uncertainties (e.g. normalization, shape)
  • .red[Constraint p.d.f. (+ data) for "auxiliary measurements"]

HistFactory Template: at a second glance

$$ f\left(\mathrm{data}\middle|\mathrm{parameters}\right) = f\left(\textcolor{#00a620}{\vec{n}}, \textcolor{#a3130f}{\vec{a}}\middle|\textcolor{#0495fc}{\vec{\eta}}, \textcolor{#9c2cfc}{\vec{\chi}}\right) = \prod_{c \,\in\, \textrm{channels}} \prod_{b \,\in\, \textrm{bins}_c} \textrm{Pois} \left(\textcolor{#00a620}{n_{cb}} \middle| \nu_{cb}\left(\textcolor{#0495fc}{\vec{\eta}}, \textcolor{#9c2cfc}{\vec{\chi}}\right)\right) \,\prod_{\chi \,\in\, \vec{\chi}} c_{\chi} \left(\textcolor{#a3130f}{a_{\chi}}\middle|\textcolor{#9c2cfc}{\chi}\right) $$

.center[$\textcolor{#00a620}{\vec{n}}$: .obsdata[events], $\textcolor{#a3130f}{\vec{a}}$: .auxdata[auxiliary data], $\textcolor{#0495fc}{\vec{\eta}}$: .freepars[unconstrained pars], $\textcolor{#9c2cfc}{\vec{\chi}}$: .conpars[constrained pars]]

$$ \nu_{cb}(\textcolor{#0495fc}{\vec{\eta}}, \textcolor{#9c2cfc}{\vec{\chi}}) = \sum_{s \,\in\, \textrm{samples}} \underbrace{\left(\sum_{\kappa \,\in\, \vec{\kappa}} \kappa_{scb}(\textcolor{#0495fc}{\vec{\eta}}, \textcolor{#9c2cfc}{\vec{\chi}})\right)}_{\textrm{multiplicative}} \Bigg(\nu_{scb}^{0}(\textcolor{#0495fc}{\vec{\eta}}, \textcolor{#9c2cfc}{\vec{\chi}}) + \underbrace{\sum_{\Delta \,\in\, \vec{\Delta}} \Delta_{scb}(\textcolor{#0495fc}{\vec{\eta}}, \textcolor{#9c2cfc}{\vec{\chi}})}_{\textrm{additive}}\Bigg) $$

.bold[Use:] Multiple disjoint channels (or regions) of binned distributions with multiple samples contributing to each with additional (possibly shared) systematics between sample estimates

.bold[Main pieces:]

  • .blue[Main Poisson p.d.f. for simultaneous measurement of multiple channels]
  • .katex[Event rates] $\nu_{cb}(\textcolor{#0495fc}{\vec{\eta}}, \textcolor{#9c2cfc}{\vec{\chi}})$ (nominal rate $\nu_{scb}^{0}$ with rate modifiers)
    • encode systematic uncertainties (e.g. normalization, shape)
  • .red[Constraint p.d.f. (+ data) for "auxiliary measurements"]

HistFactory Template: grammar

$$ f\left(\mathrm{data}\middle|\mathrm{parameters}\right) = f\left(\textcolor{#00a620}{\vec{n}}, \textcolor{#a3130f}{\vec{a}}\middle|\textcolor{#0495fc}{\vec{\eta}}, \textcolor{#9c2cfc}{\vec{\chi}}\right) = \textcolor{blue}{\prod_{c \,\in\, \textrm{channels}} \prod_{b \,\in\, \textrm{bins}_c} \textrm{Pois} \left(n_{cb} \middle| \nu_{cb}\left(\vec{\eta}, \vec{\chi}\right)\right)} \,\textcolor{red}{\prod_{\chi \,\in\, \vec{\chi}} c_{\chi} \left(a_{\chi}\middle|\chi\right)} $$

Mathematical grammar for a simultaneous fit with:

  • .blue[multiple "channels"] (analysis regions, (stacks of) histograms) that can have multiple bins
  • with systematic uncertainties that modify the event rate $\nu_{cb}(\textcolor{#0495fc}{\vec{\eta}}, \textcolor{#9c2cfc}{\vec{\chi}})$
  • coupled to a set of .red[constraint terms]

.center.width-40[SUSY-2016-16_annotated] .center[Example: .bold[Each bin] is separate (1-bin) channel, each .bold[histogram] (color)
is a sample and share a .bold[normalization systematic] uncertainty]


HistFactory Template: implementation

$$ f\left(\mathrm{data}\middle|\mathrm{parameters}\right) = f\left(\textcolor{#00a620}{\vec{n}}, \textcolor{#a3130f}{\vec{a}}\middle|\textcolor{#0495fc}{\vec{\eta}}, \textcolor{#9c2cfc}{\vec{\chi}}\right) = \prod_{c \,\in\, \textrm{channels}} \prod_{b \,\in\, \textrm{bins}_c} \textrm{Pois} \left(\textcolor{#00a620}{n_{cb}} \middle| \nu_{cb}\left(\textcolor{#0495fc}{\vec{\eta}}, \textcolor{#9c2cfc}{\vec{\chi}}\right)\right) \,\prod_{\chi \,\in\, \vec{\chi}} c_{\chi} \left(\textcolor{#a3130f}{a_{\chi}}\middle|\textcolor{#9c2cfc}{\chi}\right) $$

.center[$\textcolor{#00a620}{\vec{n}}$: .obsdata[events], $\textcolor{#a3130f}{\vec{a}}$: .auxdata[auxiliary data], $\textcolor{#0495fc}{\vec{\eta}}$: .freepars[unconstrained pars], $\textcolor{#9c2cfc}{\vec{\chi}}$: .conpars[constrained pars]]

$$ \nu_{cb}(\textcolor{#0495fc}{\vec{\eta}}, \textcolor{#9c2cfc}{\vec{\chi}}) = \sum_{s \,\in\, \textrm{samples}} \underbrace{\left(\sum_{\kappa \,\in\, \vec{\kappa}} \kappa_{scb}(\textcolor{#0495fc}{\vec{\eta}}, \textcolor{#9c2cfc}{\vec{\chi}})\right)}_{\textrm{multiplicative}} \Bigg(\nu_{scb}^{0}(\textcolor{#0495fc}{\vec{\eta}}, \textcolor{#9c2cfc}{\vec{\chi}}) + \underbrace{\sum_{\Delta \,\in\, \vec{\Delta}} \Delta_{scb}(\textcolor{#0495fc}{\vec{\eta}}, \textcolor{#9c2cfc}{\vec{\chi}})}_{\textrm{additive}}\Bigg) $$

.center[.bold[This is a mathematical representation!] Nowhere is any software spec defined] .center[.bold[Until 2018] the only implementation of HistFactory has been in ROOT]

.center.width-70[ROOT_HistFactory]


pyhf: HistFactory in pure Python

.kol-1-2.large[

  • First non-ROOT implementation of the HistFactory p.d.f. template
    • .width-40[DOI]
  • pure-Python library as second implementation of HistFactory

.center.width-100[pyhf_PyPI] ] .kol-1-2.large[


Machine Learning Frameworks for Computation

.grid[ .kol-2-3[

  • All numerical operations implemented in .bold[tensor backends] through an API of $n$-dimensional array operations
  • Using deep learning frameworks as computational backends allows for .bold[exploitation of auto differentiation (autodiff) and GPU acceleration]
  • As huge buy in from industry we benefit for free as these frameworks are .bold[continually improved] by professional software engineers (physicists are not)

.kol-1-2.center[ .width-80[scaling_hardware] ] .kol-1-2[

  • Hardware acceleration giving .bold[order of magnitude speedup] in interpolation for systematics!
    • does suffer some overhead
  • Noticeable impact for large and complex models
    • hours to minutes for fits ] ] .kol-1-4.center[ .width-85[NumPy] .width-85[PyTorch] .width-85[Tensorflow]

.width-50[![JAX](figures/logos/JAX_logo.png)] ] ]

Automatic differentiation

With tensor library backends gain access to exact (higher order) derivatives — accuracy is only limited by floating point precision

$$ \frac{\partial L}{\partial \mu}, \frac{\partial L}{\partial \theta_{i}} $$

.grid[ .kol-1-2[ .large[Exploit .bold[full gradient of the likelihood] with .bold[modern optimizers] to help speedup fit!]



.large[Gain this through the frameworks creating computational directed acyclic graphs and then applying the chain rule (to the operations)] ] .kol-1-2[ .center.width-80[DAG] ] ]


HEP Example: Likelihood Gradients

.footnote[Example adapted from Lukas Heinrich's PyHEP 2020 tutorial]

.kol-1-2.center[

] .kol-1-2.center[

]

.bold.center[Having access to the gradients can make the fit orders of magnitude faster than finite difference]


HEP Example: Likelihood Gradients

.footnote[Example adapted from Lukas Heinrich's PyHEP 2020 tutorial]

.kol-1-2.center[

] .kol-1-2.center[

]

.bold.center[Having access to the gradients can make the fit orders of magnitude faster than finite difference]


class: focus-slide, center

Enable new techniques with autodiff

.huge.bold.center[Familiar (toy) example: Optimizing selection "cut" for an analysis]


Discriminate Signal and Background

  • Counting experiment for presence of signal process
  • Place discriminate selection cut on observable $x$ to maximize significance
    • Significance: $\sqrt{2 (S+B) \log(1 + \frac{S}{B})-2S}$ (for small $S/B$: significance $\to S/\sqrt{B}$)

.footnote[Example inspired by Alexander Held's example of a differentiable analysis]

.kol-1-2.center[

] .kol-1-2.center[

]

Traditionally: Scan across cut values

  • Set baseline cut at $x=0$ (accept everything)
  • Step along cut values in $x$ and calculate significance at each cut. Keep maximum.

.kol-1-2.center[ .width-100[signal_background_stacked] ] .kol-1-2[ .width-100[significance_cut_scan] ]

.center[Significance: $\sqrt{2 (S+B) \log(1 + \frac{S}{B})-2S}$]


Differentiable Approach

.kol-1-2.large[

  • Need differentiable analogue to non-differentiable cut
  • Weight events using activation function of sigmoid

.center[$w=\left(1 + e^{-\alpha(x-c)}\right)^{-1}$]

  • Event far .italic[below] cut: $w \to 0$
  • Event far .italic[above] cut: $w \to 1$
  • $\alpha$ tunable parameter for steepness
    • Larger $\alpha$ more cut-like ] .kol-1-2[

.width-100[![sigmoid_event_weights](figures/sigmoid_event_weights.png)] ]

Compare Hard Cuts vs. Differentiable

.kol-1-2.large[

  • For hard cuts the significance was calculated by applying the cut and than using the remaining $S$ and $B$ events
  • But for the differentiable model there aren't cuts, so approximate cuts with the sigmoid approach and weights
  • Comparing the two methods shows good agreement
  • Can see that the approximation to the hard cuts improves with larger $\alpha$
    • But can become unstable, so tunable ] .kol-1-2.center[

.width-100[![significance_scan_compare](figures/significance_scan_compare.png)] ]

Compare Hard Cuts vs. Differentiable

.kol-1-2.large[

  • For hard cuts the significance was calculated by applying the cut and then using the remaining $S$ and $B$ events
  • But for the differentiable model there aren't cuts, so approximate cuts with the sigmoid approach and weights
  • Comparing the two methods shows good agreement
  • Can see that the approximation to the hard cuts improves with larger $\alpha$
    • But can become unstable, so tunable ] .kol-1-2.center[

.width-100[![significance_scan_compare_high_alpha](figures/significance_scan_compare_high_alpha.png)] ]

Accessing the Gradient

.kol-2-5.large[

  • Most importantly though, with the differentiable model we have access to the gradient
    • $\partial_{x} f(x)$
  • So can find the maximum significance at the point where the gradient of the significance is zero
    • $\partial_{x} f(x) = 0$
  • With the gradient in hand this cries out for automated optimization! ] .kol-3-5.center[

]

Automated Optimzation

.kol-2-5.large[

  • With a simple gradient descent algorithm can easily automate the significance optimization
  • For this toy example, obviously less efficient then cut and count scan
  • Gradient methods apply well in higher dimensional problems
  • Allows for the "cut" to become a parameter that can be differentiated through for the larger analysis ] .kol-3-5.center[ .width-100[automated_optimization]

]


New Art: Analysis as a Differentiable Program

.kol-1-2[

  • Provide differentiable analogue to histograms with kernel density estimation (KDE) or softmax
    • Need smooth change compared to abrupt changes in binned yields
  • Samples fed into NN that produces observable (NN output) KDE transformed and histogrammed.
  • Construct pyhf model with observable and perform inference to get $\mathrm{CL}_{s}$ for POI.
  • Backpropagate the $\mathrm{CL}_{s}$ to update weights for NN.

.center.width-40[[![neos_logo](https://raw.githubusercontent.com/gradhep/neos/master/nbs/assets/neos_logo.png)](https://github.com/gradhep/neos)] .footnote[Graphics from [Nathan Simpson's PyHEP 2020 talk](https://indico.cern.ch/event/882824/timetable/#46-neos-physics-analysis-as-a)] ] .kol-1-2.center[ .width-40[[![neoflow](figures/kde_bins.gif)](https://indico.cern.ch/event/882824/timetable/#46-neos-physics-analysis-as-a)] .width-100[[![neoflow](figures/neoflow.png)](https://indico.cern.ch/event/882824/timetable/#46-neos-physics-analysis-as-a)] ]

New Art: Analysis as a Differentiable Program

.center[neos 3 bin KDE transformed observable (NN output) optimized with systematics w.r.t. $\mathrm{CL}_{s}$] .center.width-100[neos_gif]

.kol-1-3[

  • .neos-orange[Background] and .neos-blue[signal] samples
    • Same colors for dist. / hist.
  • 3 decision regions are mappings of NN output
    • $[0.67, 1.0]$ bin $\to$ top left region ] .kol-1-3[
  • From KDE of NN output form pyhf model with 1 channel with 2 samples and 3 bins
  • $\mathrm{CL}_{s}$ value minimized as goal of NN ] .kol-1-3[
  • Observations in NN output
    • $0$: Background-like
    • $1$: Signal-like
  • Binned contents channel input for pyhf model ]

class: focus-slide, center

Scalable solutions

.huge.bold.center[Differentiable analyses at LHC scale]


Scaling is reasonable

From the 2023 MIAPbP Workshop on on Differentiable and Probabilistic Programming for physics engagement with the broader community showed multiple large scale workflows

.center[.bold[If] things are differentiable, shouldn't be scared of .bold[large-scale codebases and applications]]

.kol-1-2[

] .kol-1-2[

.center[[Nicolas Gauger, MIAPbP Workshop 2023](https://indico.ph.tum.de/event/7314/contributions/7432/)] ]

Gradient Passing

.kol-2-5.code-large[

  • Real world high energy physics analyses have various challenges:
    • Computations highly complex chains
    • Not implementable in a single framework
    • Asynchronous multi-step procedures
    • Strong need for distributed computing
  • Passing of gradients .bold[between] different implementations and services
    • Large scale machine learning in industry needs to do this to train models
  • Possible solution to allow for distributed computations at scale exploiting gradients ] .kol-3-5.center[

.width-100[[![metadiff](figures/metadiff.png)](https://indico.cern.ch/event/960587/contributions/4070325/)] .caption[[Differentiating through PyTorch, JAX, and TensorFlow using FaaS](https://indico.cern.ch/event/960587/contributions/4070325/), Lukas Heinrich] ]

Scaling and Analysis Reuse

.center[Revisiting IRIS-HEP Analysis Systems in the context of distributed scaling and analysis reuse]


Analysis Reuse

.large[

  • Data and analyses done at the LHC are unique physics opportunities
  • RECAST has been implemented in ATLAS as an enabling technology
  • Resulting in ATLAS PUB notes extending the physics reach of original publications ]

.kol-1-3[

.caption[[ATL-PHYS-PUB-2019-032](https://inspirehep.net/literature/1795215)] ] .kol-1-3[

.caption[[ATL-PHYS-PUB-2020-007](https://inspirehep.net/literature/1795203)] ] .kol-1-3[

.caption[[ATL-PHYS-PUB-2021-020](https://inspirehep.net/literature/1870397)] ]

ML + reinterpretation: Active learning

.kol-1-2[ .huge[ Leveraging REANA reproducible research data analysis platform possible to run distributed ML and analysis workflows at scale ]

.caption[[ Christian Weber, Reinterpretation Forum 2023](https://conference.ippp.dur.ac.uk/event/1178/contributions/6449/)] ] .kol-1-2[

.caption[[ATL-PHYS-PUB-2023-010](https://atlas.web.cern.ch/Atlas/GROUPS/PHYSICS/PUBNOTES/ATL-PHYS-PUB-2023-010/)] ]

Applications beyond HEP

.huge[

  • General techniques and technologies applied to HEP problems, but not constrained to them
    • Automatic differentiation is a rich field of research unto itself
  • Engagement with the broader scientific open source community
  • Planning for analysis reuse brings flexibility to leverage tooling ]

Summary

.huge[

  • Many challenges and opportunities ahead at the HL-LHC
  • Engaging the broader scientific open source community has been a boon for particle physics tooling
  • Automatic differentiation gives a powerful tool in the form of differentiable programming
  • Scalable and reusable analysis workflows allow leveraging our tools ]


class: end-slide, center

.large[Backup]


Opportunities and Challenges of the HL-LHC

.center.large[Challenge to be able to .bold[record, store, and analyze] the data]

.kol-1-2[

] .kol-1-2[

]

.center.large[Projected .bold[required disk usage] for HL-LHC (want R&D below budget line)]

.center[ATLAS and CMS software and computing reviews]


Automatic Differentiation: Forward and Reverse

.center[Performing maps $f: \mathbb{R}^{m} \to \mathbb{R}^{n}$]
.center[aka, "wide" vs. "tall" transformations]
.kol-1-2[

  • .bold[Forward] mode
  • Column wise evaluation of Jacobian
    • Jacobian-vector products
    • Execution time scales with input parameters
    • Example: few variables into very high dimensional spaces $\mathbb{R} \to \mathbb{R}^{100}$ ] .kol-1-2[
  • .bold[Reverse] mode
  • Row wise evaluation of Jacobian
    • vector-Jacobian products
    • Execution time scales with output parameters
    • Example: scalar maps from very high-dimensional spaces $\mathbb{R}^{100} \to \mathbb{R}$ ]

.center[Allows for efficient computation depending on dimensionality]

HistFactory Template

$$ f\left(\mathrm{data}\middle|\mathrm{parameters}\right) = f\left(\vec{n}, \vec{a}\middle|\vec{\eta}, \vec{\chi}\right) = \color{blue}{\prod_{c \,\in\, \textrm{channels}} \prod_{b \,\in\, \textrm{bins}_c} \textrm{Pois} \left(n_{cb} \middle| \nu_{cb}\left(\vec{\eta}, \vec{\chi}\right)\right)} \,\color{red}{\prod_{\chi \,\in\, \vec{\chi}} c_{\chi} \left(a_{\chi}\middle|\chi\right)} $$

.bold[Use:] Multiple disjoint channels (or regions) of binned distributions with multiple samples contributing to each with additional (possibly shared) systematics between sample estimates

.kol-1-2[ .bold[Main pieces:]

  • .blue[Main Poisson p.d.f. for simultaneous measurement of multiple channels]
  • .katex[Event rates] $\nu_{cb}$ (nominal rate $\nu_{scb}^{0}$ with rate modifiers)
  • .red[Constraint p.d.f. (+ data) for "auxiliary measurements"]
    • encode systematic uncertainties (e.g. normalization, shape)
  • $\vec{n}$: events, $\vec{a}$: auxiliary data, $\vec{\eta}$: unconstrained pars, $\vec{\chi}$: constrained pars ] .kol-1-2[ .center.width-100[SUSY-2016-16_annotated] .center[Example: .bold[Each bin] is separate (1-bin) channel,
    each .bold[histogram] (color) is a sample and share
    a .bold[normalization systematic] uncertainty] ]

HistFactory Template

$$ f\left(\vec{n}, \vec{a}\middle|\vec{\eta}, \vec{\chi}\right) = \color{blue}{\prod_{c \,\in\, \textrm{channels}} \prod_{b \,\in\, \textrm{bins}_c} \textrm{Pois} \left(n_{cb} \middle| \nu_{cb}\left(\vec{\eta}, \vec{\chi}\right)\right)} \,\color{red}{\prod_{\chi \,\in\, \vec{\chi}} c_{\chi} \left(a_{\chi}\middle|\chi\right)} $$

Mathematical grammar for a simultaneous fit with

  • .blue[multiple "channels"] (analysis regions, (stacks of) histograms)
  • each region can have .blue[multiple bins]
  • coupled to a set of .red[constraint terms]

.center[.bold[This is a mathematical representation!] Nowhere is any software spec defined] .center[.bold[Until recently] (2018), the only implementation of HistFactory has been in ROOT]

.bold[pyhf: HistFactory in pure Python] .center.width-40[pyhf_PyPI]


HistFactory Template: systematic uncertainties

.kol-4-7[

  • In HEP common for systematic uncertainties to be specified with two template histograms: "up" and "down" variation for parameter $\theta \in \{\textcolor{#0495fc}{\vec{\eta}}, \textcolor{#9c2cfc}{\vec{\chi}} \}$
    • "up" variation: model prediction for $\theta = +1$
    • "down" variation: model prediction for $\theta = -1$
    • Interpolation and extrapolation choices provide .bold[model predictions $\nu(\vec{\theta},)$ for any $\vec{\theta}$]
  • Constraint terms $c_{j} \left(\textcolor{#a3130f}{a_{j}}\middle|\textcolor{#9c2cfc}{\theta_{j}}\right)$ used to model auxiliary measurements. Example for Normal (most common case):
    • Mean of nuisance parameter $\textcolor{#9c2cfc}{\theta_{j}}$ with normalized width ($\sigma=1$)
    • Normal: auxiliary data $\textcolor{#a3130f}{a_{j} = 0}$ (aux data function of modifier type)
    • Constraint term produces penalty in likelihood for pulling $\textcolor{#9c2cfc}{\theta_{j}}$ away from auxiliary measurement value
    • As $\nu(\vec{\theta},)$ constraint terms inform rate modifiers (.bold[systematic uncertainties]) during simultaneous fit
    • Example: Correlated shape histosys modifier could represent part of the uncertainty associated with a jet energy scale ] .kol-3-7[ .center.width-70[systematics] .center[Image credit: Alex Held] ]

What is pyhf?

Please checkout the many resources we have starting with the website and the SciPy 2020 talk!

.grid[ .kol-1-3.center[ .width-60[[![scikit-hep_logo](https://scikit-hep.org/assets/images/logo.png)](https://scikit-hep.org/)] ] .kol-1-3.center[
.width-60[[![pyhf_logo](https://iris-hep.org/assets/logos/pyhf-logo.png)](https://github.com/scikit-hep/pyhf)] ] .kol-1-3.center[
.width-70[[![iris-hep_logo](assets/logos/logo_IRIS-HEP.png)](https://iris-hep.org/)] ] ]

Differentiable Ecosystem

.kol-1-3.center[ .width-100[gradhep]

gradhep ] .kol-1-3.center[ .width-100[neos_logo]

neos, INFERNO ] .kol-1-3.center[

.width-100[MLE_grad_map_full]




ACTS ]


.kol-1-1[ .bold.center[Groups, libraries, and applications growing rapidly] ]

References

  1. Lukas Heinrich, .italic[Distributed Gradients for Differentiable Analysis], Future Analysis Systems and Facilities Workshop, 2020.
  2. Jim Pivarski, .italic[History and Adoption of Programming Languages in NHEP], Software & Computing Round Table, 2022.

class: end-slide, center count: false

The end.