class: middle, center, title-slide count: false
.huge[Lukas Heinrich], .huge[Matthew Feickert], .huge.blue[Giordon Stark]
.huge[(SCIPP, UC Santa Cruz)]
[email protected]
December 10th, 2020
.grid[ .kol-1-3.center[ .circle.width-80[]
CERN ] .kol-1-3.center[ .circle.width-80[]
Illinois ] .kol-1-3.center[ .circle.width-75[]
UCSC SCIPP ] ]
.kol-2-3[
- First non-ROOT implementation of the HistFactory p.d.f. template
- pure-Python library with Python and CLI API
$ pip install pyhf
- No dependence on ROOT!
- Open source tool for all of HEP
- IRIS-HEP supported Scikit-HEP project
- Used for reinterpretation in phenomenology paper
(DOI: 10.1007/JHEP04(2019)144) andSModelS
(arXiv:2009.01809) - Already in use by ATLAS SUSY groups, HH combination group, and for internal
pMSSM SUSY large scale reinterpretation ] .kol-1-3.center[ .width-100[] .width-100[] ]
.kol-1-2[
Core libraries (though all lightweight installs):
- SciPy - Scientific Python (optimization routines)
- click - Command line interface
- tqdm - Progress bars
- jsonschema - HistFactory JSON specification
- jsonpatch - Signal reinterpretation
- PyYAML - Command line niceties ] .kol-1-2[
Depending on what users want to do:
- TensorFlow - autodiff, GPUs
- PyTorch - autodiff, GPUs
- JAX - autodiff, GPUs, jit
- iminuit - alternative minimizer choice
- uproot - ROOT I/O interop ]
.kol-1-1[ Getting "extras" is easy:
$ python -m pip install --upgrade pyhf[xmlio] # Gets uproot
$ python -m pip install --upgrade pyhf[backends] # Gets all backends
$ python -m pip install --upgrade pyhf[jax,xmlio,minuit] # Gets JAX, uproot, and iminuit
]
.grid[ .kol-2-3[
- All numerical operations implemented in .bold[tensor backends] through an API of
$n$ -dimensional array operations - Using deep learning frameworks as computational backends allows for .bold[exploitation of autodiff and GPU acceleration]
- As huge buy in from industry we benefit for free as these frameworks are .bold[continually improved] by professional software engineers (physicists are not)
.kol-1-2.center[
.width-90[]
]
.kol-1-2[
- Show hardware acceleration giving .bold[order of magnitude speedup] for some models!
- Improvements over traditional
.width-50[![JAX](figures/logos/JAX_logo.png)] ] ]
- Unconstrained and constrained fits
- Exclusion fits
- Discovery fits (imminent
v0.6.0
release) - Conversion to/from XML+ROOT to JSON
- This works with any HistFactory workspace! (HistFitter, TRExFitter WSMaker, etc... don't need to do anything special)
- Brazil bands
- Pull plots†
- Impact/ranking plots†
- pseudoexperiments ("toys") (imminent
v0.6.0
release)
.smaller[†Note: the pyhf
API is meant to allow for higher-level frameworks to build on top, such as cabinetry
.
- Missing a meta-language (DSL, metadata) that describes the data that can be passed to plotting utilities
cabinetry
is meant to help with plotting things "correctly"- All of this work is openly developed with extensive feedback ]
See our roadmap to get an idea of where we're going!
With tensor library backends gain access to exact (higher order) derivatives — accuracy is only limited by floating point precision
.grid[ .kol-1-2[ .large[Exploit .bold[full gradient of the likelihood] with .bold[modern optimizers] to help speedup fit!]
.large[Gain this through the frameworks creating computational directed acyclic graphs and then applying the chain rule (to the operations)]
]
.kol-1-2[
.center.width-80[]
]
]
.footnote[Example adapted from Lukas Heinrich's PyHEP 2020 tutorial]
.kol-1-2.center[ .width-90[] ] .kol-1-2.center[ .width-90[] ]
.bold.center[Having access to the gradients makes the fit orders of magnitude faster than finite difference]
.grid[ .kol-1-1.center[All documentation can be found at https://scikit-hep.org/pyhf/.] .kol-1-2[ In this documentation you can find a list of:
All of our documentation is tested nightly, against our software, as well as updates to software and tools we depend on. In addition to this, we've made full use of:
] .kol-1-2[ Most recently gave a successful, in-depth tutorial at the ATLAS SUSY+Exotics workshop.
.grid[ .kol-2-3.push-1-6.center.gray[ Out of all the toolkits, why do you think your users choose to use yours? ] .kol-1-1[
- Easy to use and install: PyPI, TestPyPI (bleeding edge), conda-forge, and Docker
- Fast code, fast development cycle, fast feedback
- Well-documented Python implementations, clear communication channels to devs and community
- Command line complements the Pythonic API
- We really love our CLI, it plays nicely with shell "behavior" such as piping
$ pyhf prune --sample ttbar BkgOnly.json | pyhf inspect
- We really love our CLI, it plays nicely with shell "behavior" such as piping
- Significant test-driven development (underlies all of our work) with 1000+ tests!
$ pytest --collect-only | grep "<Function\|<Class" -c 1306
- Every commit tested in CI across Python 3.6, 3.7, 3.8 on Linux and MacOS systems with nightlies
But we believe the biggest reason users choose pyhf
is because
.center.huge[pyhf
is developed openly and freely]
]
]
.grid[
.kol-2-3.push-1-6.center.gray[ Is your toolkit using some external packages / common scripts / macros / functions to perform some of the operations like fit, limit setting, significance computation, Asimov-creation, ranking plot? ]
.kol-1-1[
- Fits, limit setting: SciPy and minuit
- Test statistics are implemented in
pyhf
- Asimov creation: just a fit in
pyhf
to generate the Asimov dataset ] ]
.grid[
.kol-2-3.push-1-6.center.gray[ Which pieces of your toolkit could be factorized out into a package that would be developed/supported/distributed by ATLAS? ]
.kol-1-1[
We don't necessarily believe any particular piece needs to be factorized out into a package maintained by ATLAS.
- pure-Python implementation of HistFactory (a mathematical model)
pyhf
is a low(er)-level library to interact with the HistFactory JSON workspaces- Higher-level tools are encouraged to build on top of
pyhf
to extend the functionality into plots, limit setting, and other debugging utilities- c.f.
cabinetry
as excellent example ] ]
- c.f.
.grid[
.kol-2-3.push-1-6.center.gray[ What additional common software could your toolkit take advantage of? ]
.kol-1-1[
- Not sure
- We are willing to try out new ideas all the time
- If you have ideas, get in touch with us! ] ]
.grid[
.kol-2-3.push-1-6.center.gray[ Would you be willing to contribute to the development of a centrally distributed toolkit that provides functionality for providing common statistical operations (e.g. calculating a
- Cannot make any promises at this time
- All core developers are very busy with convener roles and contact roles in ATLAS and IRIS-HEP ] ]
.kol-2-3[
pyhf
fits into the "open science" ecosystem:
- reproducible workflows via RECAST/REANA benefits from JSON HistFactory
- reinterpretation is a breeze
- statistical workspaces can be serialized/preserved
- Reinterpretation Forum paper recommends the use of
pyhf
likelihoods - SModelS provides an interface for
pyhf
- Native HEPdata support (ongoing!)
- ATLAS SUSY group has published
pyhf
JSON HistFactory workspaces for five analyses ] .kol-1-3[
.center.width-100.tiny[ [![cranmer talk](figures/two_tastes.png)](https://indico.cern.ch/event/962997/)
(stolen from Kyle Cranmer) ] ]
.kol-2-3[
.large[pyhf
provides:]
- .large[.bold[Accelerated] fitting library]
- reducing time to insight/inference!
- Hardware acceleration on GPUs and vectorized operations
- Backend agnostic Python API and CLI
- .large[Flexible .bold[declarative] schema]
- JSON: ubiquitous, universal support, versionable
- .large[Enabling technology for .bold[reinterpretation]]
- JSON Patch files for efficient computation of new signal models
- Unifying tool for theoretical and experimental physicists
- .large[Project in growing .bold[Pythonic HEP ecosystem]]
- Openly developed on GitHub and welcome contributions
- Comprehensive open tutorials
- Ask us about Scikit-HEP and IRIS-HEP! ] .kol-1-3[
.center.width-100[[![pyhf_logo](https://iris-hep.org/assets/logos/pyhf-logo.png)](https://github.com/scikit-hep/pyhf)] ]
class: middle
.center[
.large[www.scikit-hep.org/pyhf]
]
.grid[
.kol-1-3.center[
.width-90[]
]
.kol-1-3.center[
.width-90[]
]
.kol-1-3.center[
.width-100[]
]
]
class: end-slide, center
.large[Backup]
Required dependencies from our setup.cfg
:
.grid[ .kol-2-3[
install_requires =
scipy>=1.4.0
click>=6.0
tqdm
jsonschema>=3.2.0
jsonpatch
pyyaml
- SciPy - Scientific Python (optimization routines)
- click - Command line interface
- tqdm - Progress bars
- jsonschema - HistFactory JSON specification
- jsonpatch - Signal reinterpretation
- pyyaml - Command line niceties
]
.kol-1-3.center[
.width-50[]
.width-50[]
.width-25[] ] ]
We have lots of optional dependencies depending on what users want to do:
- TensorFlow - autodiff
- PyTorch - autodiff
- JAX - autodiff, jit
- iminuit - minuit interface (MIGRAD/HESSE/MINOS available)
- uproot - ROOT I/O interop
- A flexible probability density function (p.d.f.) template to build statistical models in high energy physics
- Developed in 2011 during work that lead to the Higgs discovery [CERN-OPEN-2012-016]
- Widely used by the HEP community for .bold[measurements of known physics] (Standard Model) and
.bold[searches for new physics] (beyond the Standard Model)
.kol-2-5.center[ .width-90[] .bold[Standard Model] ] .kol-3-5.center[ .width-100[] .bold[Beyond the Standard Model] ]
.bold[Use:] Multiple disjoint channels (or regions) of binned distributions with multiple samples contributing to each with additional (possibly shared) systematics between sample estimates
.kol-1-2[ .bold[Main pieces:]
- .blue[Main Poisson p.d.f. for simultaneous measurement of multiple channels]
- .katex[Event rates]
$\nu_{cb}$ (nominal rate$\nu_{scb}^{0}$ with rate modifiers) - .red[Constraint p.d.f. (+ data) for "auxiliary measurements"]
- encode systematic uncertainties (e.g. normalization, shape)
-
$\vec{n}$ : events,$\vec{a}$ : auxiliary data,$\vec{\eta}$ : unconstrained pars,$\vec{\chi}$ : constrained pars ] .kol-1-2[ .center.width-100[] .center[Example: .bold[Each bin] is separate (1-bin) channel,
each .bold[histogram] (color) is a sample and share
a .bold[normalization systematic] uncertainty] ]
Mathematical grammar for a simultaneous fit with
- .blue[multiple "channels"] (analysis regions, (stacks of) histograms)
- each region can have .blue[multiple bins]
- coupled to a set of .red[constraint terms]
.center[.bold[This is a _mathematical_ representation!] Nowhere is any software spec defined] .center[.bold[Until recently] (2018), the only implementation of HistFactory has been in [`ROOT`](https://root.cern.ch/)]
.bold[Use:] Multiple disjoint channels (or regions) of binned distributions with multiple samples contributing to each with additional (possibly shared) systematics between sample estimates
.bold[Main pieces:]
- .blue[Main Poisson p.d.f. for simultaneous measurement of multiple channels]
- .katex[Event rates]
$\nu_{cb}$ from nominal rate$\nu_{scb}^{0}$ and rate modifiers$\kappa$ and$\Delta$ - .red[Constraint p.d.f. (+ data) for "auxiliary measurements"]
- encoding systematic uncertainties (normalization, shape, etc)
-
$\vec{n}$ : events,$\vec{a}$ : auxiliary data,$\vec{\eta}$ : unconstrained pars,$\vec{\chi}$ : constrained pars
.kol-1-2.width-90[
- High information-density summary of analysis
- Almost everything we do in the analysis ultimately affects the likelihood and is encapsulated in it
- Trigger
- Detector
- Combined Performance / Physics Object Groups
- Systematic Uncertainties
- Event Selection
- Unique representation of the analysis to reuse and preserve
]
.kol-1-2.width-100[
]
- F. James, Y. Perrin, L. Lyons, .italic[Workshop on confidence limits: Proceedings], 2000.
- ROOT collaboration, K. Cranmer, G. Lewis, L. Moneta, A. Shibata and W. Verkerke, .italic[HistFactory: A tool for creating statistical models for use with RooFit and RooStats], 2012.
- L. Heinrich, H. Schulz, J. Turner and Y. Zhou, .italic[Constraining $A_{4}$ Leptonic Flavour Model Parameters at Colliders and Beyond], 2018.
- A. Read, .italic[Modified frequentist analysis of search results (the $\mathrm{CL}_{s}$ method)], 2000.
- K. Cranmer, .italic[CERN Latin-American School of High-Energy Physics: Statistics for Particle Physicists], 2013.
- ATLAS collaboration, .italic[Search for bottom-squark pair production with the ATLAS detector in final states containing Higgs bosons, b-jets and missing transverse momentum], 2019
- ATLAS collaboration, .italic[Reproducing searches for new physics with the ATLAS experiment through publication of full statistical likelihoods], 2019
- ATLAS collaboration, .italic[Search for bottom-squark pair production with the ATLAS detector in final states containing Higgs bosons, b-jets and missing transverse momentum: HEPData entry], 2019
class: end-slide, center count: false
The end.