December 10th, 2020
- First non-ROOT implementation of the HistFactory p.d.f. template
- pure-Python library with Python and CLI API
$ pip install pyhf
- No dependence on ROOT!
- Open source tool for all of HEP
- IRIS-HEP supported Scikit-HEP project
- Used for reinterpretation in phenomenology paper
(DOI: 10.1007/JHEP04(2019)144) and SModelS
(arXiv:2009.01809) - Already in use by ATLAS SUSY groups, HH combination group, and for internal
pMSSM SUSY large scale reinterpretation
Core libraries (though all lightweight installs):
- SciPy - Scientific Python (optimization routines)
- click - Command line interface
- tqdm - Progress bars
- jsonschema - HistFactory JSON specification
- jsonpatch - Signal reinterpretation
- PyYAML - Command line niceties ] .kol-1-2[
Depending on what users want to do:
- TensorFlow - autodiff, GPUs
- PyTorch - autodiff, GPUs
- JAX - autodiff, GPUs, jit
- iminuit - alternative minimizer choice
- uproot - ROOT I/O interop ]
.kol-1-1[ Getting "extras" is easy:
$ python -m pip install --upgrade pyhf[xmlio] # Gets uproot
$ python -m pip install --upgrade pyhf[backends] # Gets all backends
$ python -m pip install --upgrade pyhf[jax,xmlio,minuit] # Gets JAX, uproot, and iminuit
.grid[ .kol-2-3[
- All numerical operations implemented in .bold[tensor backends] through an API of
$n$ -dimensional array operations - Using deep learning frameworks as computational backends allows for .bold[exploitation of autodiff and GPU acceleration]
- As huge buy in from industry we benefit for free as these frameworks are .bold[continually improved] by professional software engineers (physicists are not)[
- Show hardware acceleration giving .bold[order of magnitude speedup] for some models!
- Improvements over traditional
Improvements over traditional
- Unconstrained and constrained fits
- Exclusion fits
- Discovery fits (imminent
release) - Conversion to/from XML+ROOT to JSON
- This works with any HistFactory workspace! (HistFitter, TRExFitter WSMaker, etc... don't need to do anything special)
- Brazil bands
- Pull plots†
- Impact/ranking plots†
- pseudoexperiments ("toys") (imminent
.smaller[†Note: the pyhf
API is meant to allow for higher-level frameworks to build on top, such as cabinetry
- Missing a meta-language (DSL, metadata) that describes the data that can be passed to plotting utilities
is meant to help with plotting things "correctly"- All of this work is openly developed with extensive feedback ]
See our roadmap to get an idea of where we're going!
With tensor library backends gain access to exact (higher order) derivatives — accuracy is only limited by floating point precision
.grid[ .kol-1-2[ .large[Exploit .bold[full gradient of the likelihood] with .bold[modern optimizers] to help speedup fit!]
.large[Gain this through the frameworks creating computational directed acyclic graphs and then applying the chain rule (to the operations)]
Example adapted from Lukas Heinrich's PyHEP 2020 tutorial. Having access to the gradients makes the fit orders of magnitude faster than finite difference
All documentation can be found at
All of our documentation is tested nightly, against our software, as well as updates to software and tools we depend on. In addition to this, we've made full use of:
] .kol-1-2[ Most recently gave a successful, in-depth tutorial at the ATLAS SUSY+Exotics workshop.
.grid[[ Out of all the toolkits, why do you think your users choose to use yours? ] .kol-1-1[
- Easy to use and install: PyPI, TestPyPI (bleeding edge), conda-forge, and Docker
- Fast code, fast development cycle, fast feedback
- Well-documented Python implementations, clear communication channels to devs and community
- Command line complements the Pythonic API
- We really love our CLI, it plays nicely with shell "behavior" such as piping
$ pyhf prune --sample ttbar BkgOnly.json | pyhf inspect
- Significant test-driven development (underlies all of our work) with 1000+ tests!
$ pytest --collect-only | grep "<Function\|<Class" -c 1306
- Every commit tested in CI across Python 3.6, 3.7, 3.8 on Linux and MacOS systems with nightlies
But we believe the biggest reason users choose pyhf
is because
is developed openly and freely]
.grid[[ Is your toolkit using some external packages / common scripts / macros / functions to perform some of the operations like fit, limit setting, significance computation, Asimov-creation, ranking plot? ]
- Fits, limit setting: SciPy and minuit
- Test statistics are implemented in
- Asimov creation: just a fit in
to generate the Asimov dataset ] ]
.grid[[ Which pieces of your toolkit could be factorized out into a package that would be developed/supported/distributed by ATLAS? ]
We don't necessarily believe any particular piece needs to be factorized out into a package maintained by ATLAS.
- pure-Python implementation of HistFactory (a mathematical model)
is a low(er)-level library to interact with the HistFactory JSON workspaces- Higher-level tools are encouraged to build on top of
to extend the functionality into plots, limit setting, and other debugging utilities- c.f.
as excellent example ] ]
- c.f.
.grid[[ What additional common software could your toolkit take advantage of? ]
- Not sure
- We are willing to try out new ideas all the time
- If you have ideas, get in touch with us! ] ]
.grid[[ Would you be willing to contribute to the development of a centrally distributed toolkit that provides functionality for providing common statistical operations (e.g. calculating a
- Cannot make any promises at this time
- All core developers are very busy with convener roles and contact roles in ATLAS and IRIS-HEP ] ]
fits into the "open science" ecosystem:
- reproducible workflows via RECAST/REANA benefits from JSON HistFactory
- reinterpretation is a breeze
- statistical workspaces can be serialized/preserved
- Reinterpretation Forum paper recommends the use of
likelihoods - SModelS provides an interface for
- Native HEPdata support (ongoing!)
- ATLAS SUSY group has published
JSON HistFactory workspaces for five analyses ] .kol-1-3[
(stolen from Kyle Cranmer)
(stolen from Kyle Cranmer) ] ]
- .large[.bold[Accelerated] fitting library]
- reducing time to insight/inference!
- Hardware acceleration on GPUs and vectorized operations
- Backend agnostic Python API and CLI
- .large[Flexible .bold[declarative] schema]
- JSON: ubiquitous, universal support, versionable
- .large[Enabling technology for .bold[reinterpretation]]
- JSON Patch files for efficient computation of new signal models
- Unifying tool for theoretical and experimental physicists
- .large[Project in growing .bold[Pythonic HEP ecosystem]]
- Openly developed on GitHub and welcome contributions
- Comprehensive open tutorials
- Ask us about Scikit-HEP and IRIS-HEP! ] .kol-1-3[
.center.width-100[[![pyhf_logo](](] ]
We have lots of optional dependencies depending on what users want to do:
- TensorFlow - autodiff
- PyTorch - autodiff
- JAX - autodiff, jit
- iminuit - minuit interface (MIGRAD/HESSE/MINOS available)
- uproot - ROOT I/O interop
- A flexible probability density function (p.d.f.) template to build statistical models in high energy physics
Developed in 2011 during work that lead to the Higgs discovery [CERN-OPEN-2012-016]
- Widely used by the HEP community for .bold[measurements of known physics] (Standard Model) and
.bold[searches for new physics] (beyond the Standard Model)[ .width-90[] .bold[Standard Model] ][ .width-100[] .bold[Beyond the Standard Model] ]
.bold[Use:] Multiple disjoint channels (or regions) of binned distributions with multiple samples contributing to each with additional (possibly shared) systematics between sample estimates
.kol-1-2[ .bold[Main pieces:]
- .blue[Main Poisson p.d.f. for simultaneous measurement of multiple channels]
- .katex[Event rates]
$\nu_{cb}$ (nominal rate$\nu_{scb}^{0}$ with rate modifiers) - .red[Constraint p.d.f. (+ data) for "auxiliary measurements"]
- encode systematic uncertainties (e.g. normalization, shape)
$\vec{n}$ : events,$\vec{a}$ : auxiliary data,$\vec{\eta}$ : unconstrained pars,$\vec{\chi}$ : constrained pars ] .kol-1-2[ .center.width-100[] .center[Example: .bold[Each bin] is separate (1-bin) channel,
each .bold[histogram] (color) is a sample and share
a .bold[normalization systematic] uncertainty] ]
Mathematical grammar for a simultaneous fit with
- .blue[multiple "channels"] (analysis regions, (stacks of) histograms)
- each region can have .blue[multiple bins]
- coupled to a set of .red[constraint terms]
This is a _mathematical_ representation! Nowhere is any software spec defined. Until recently (2018), the only implementation of HistFactory has been in ROOT
.bold[Use:] Multiple disjoint channels (or regions) of binned distributions with multiple samples contributing to each with additional (possibly shared) systematics between sample estimates
.bold[Main pieces:]
- .blue[Main Poisson p.d.f. for simultaneous measurement of multiple channels]
- .katex[Event rates]
$\nu_{cb}$ from nominal rate$\nu_{scb}^{0}$ and rate modifiers$\kappa$ and$\Delta$ - .red[Constraint p.d.f. (+ data) for "auxiliary measurements"]
- encoding systematic uncertainties (normalization, shape, etc)
$\vec{n}$ : events,$\vec{a}$ : auxiliary data,$\vec{\eta}$ : unconstrained pars,$\vec{\chi}$ : constrained pars
- High information-density summary of analysis
- Almost everything we do in the analysis ultimately affects the likelihood and is encapsulated in it
- Trigger
- Detector
- Combined Performance / Physics Object Groups
- Systematic Uncertainties
- Event Selection
- Unique representation of the analysis to reuse and preserve
