From 16315e75c00365cab0a881b96c558dc7a91c27b8 Mon Sep 17 00:00:00 2001
From: Nathan Simpson <nsimpson@turing.ac.uk>
Date: Tue, 4 Jul 2023 19:33:32 +0100
Subject: [PATCH] prettier

---
 .github/CONTRIBUTING.md |  40 +++++-------
 README.md               |  80 ++++++++++-------------
 list_of_operations.md   | 136 ++++++++++++++++++----------------------
 3 files changed, 111 insertions(+), 145 deletions(-)

diff --git a/.github/CONTRIBUTING.md b/.github/CONTRIBUTING.md
index 4ff1b8a..c415f28 100644
--- a/.github/CONTRIBUTING.md
+++ b/.github/CONTRIBUTING.md
@@ -1,20 +1,18 @@
-See the [Scientific Python Developer Guide][spc-dev-intro] for a detailed
-description of best practices for developing scientific packages.
+See the [Scientific Python Developer Guide][spc-dev-intro] for a detailed description of best
+practices for developing scientific packages.
 
 [spc-dev-intro]: https://scientific-python-cookie.readthedocs.io/guide/intro
 
 # Quick development
 
-The fastest way to start with development is to use nox. If you don't have nox,
-you can use `pipx run nox` to run it without installing, or `pipx install nox`.
-If you don't have pipx (pip for applications), then you can install with with
-`pip install pipx` (the only case were installing an application with regular
-pip is reasonable). If you use macOS, then pipx and nox are both in brew, use
-`brew install pipx nox`.
+The fastest way to start with development is to use nox. If you don't have nox, you can use
+`pipx run nox` to run it without installing, or `pipx install nox`. If you don't have pipx (pip for
+applications), then you can install with with `pip install pipx` (the only case were installing an
+application with regular pip is reasonable). If you use macOS, then pipx and nox are both in brew,
+use `brew install pipx nox`.
 
-To use, run `nox`. This will lint and test using every installed version of
-Python on your system, skipping ones that are not installed. You can also run
-specific jobs:
+To use, run `nox`. This will lint and test using every installed version of Python on your system,
+skipping ones that are not installed. You can also run specific jobs:
 
 ```console
 $ nox -s lint  # Lint only
@@ -23,8 +21,7 @@ $ nox -s docs -- serve  # Build and serve the docs
 $ nox -s build  # Make an SDist and wheel
 ```
 
-Nox handles everything for you, including setting up an temporary virtual
-environment for each run.
+Nox handles everything for you, including setting up an temporary virtual environment for each run.
 
 # Setting up a development environment manually
 
@@ -36,9 +33,8 @@ source ./.venv/bin/activate
 pip install -v -e .[dev]
 ```
 
-If you have the
-[Python Launcher for Unix](https://github.com/brettcannon/python-launcher), you
-can instead do:
+If you have the [Python Launcher for Unix](https://github.com/brettcannon/python-launcher), you can
+instead do:
 
 ```bash
 py -m venv .venv
@@ -47,16 +43,15 @@ py -m install -v -e .[dev]
 
 # Post setup
 
-You should prepare pre-commit, which will help you by checking that commits pass
-required checks:
+You should prepare pre-commit, which will help you by checking that commits pass required checks:
 
 ```bash
 pip install pre-commit # or brew install pre-commit on macOS
 pre-commit install # Will install a pre-commit hook into the git repo
 ```
 
-You can also/alternatively run `pre-commit run` (changes only) or
-`pre-commit run --all-files` to check even without installing the hook.
+You can also/alternatively run `pre-commit run` (changes only) or `pre-commit run --all-files` to
+check even without installing the hook.
 
 # Testing
 
@@ -90,9 +85,8 @@ nox -s docs -- serve
 
 # Pre-commit
 
-This project uses pre-commit for all style checking. While you can run it with
-nox, this is such an important tool that it deserves to be installed on its own.
-Install pre-commit and run:
+This project uses pre-commit for all style checking. While you can run it with nox, this is such an
+important tool that it deserves to be installed on its own. Install pre-commit and run:
 
 ```bash
 pre-commit run -a
diff --git a/README.md b/README.md
index 589eff4..c1ca583 100644
--- a/README.md
+++ b/README.md
@@ -27,8 +27,7 @@
 [github-discussions-badge]:
   https://img.shields.io/static/v1?label=Discussions&message=Ask&color=blue&logo=github
 [github-discussions-link]: https://github.com/gradhep/relaxed/discussions
-[gitter-badge]:
-  https://badges.gitter.im/https://github.com/gradhep/relaxed/community.svg
+[gitter-badge]: https://badges.gitter.im/https://github.com/gradhep/relaxed/community.svg
 [gitter-link]:
   https://gitter.im/https://github.com/gradhep/relaxed/community?utm_source=badge&utm_medium=badge&utm_campaign=pr-badge
 [pypi-link]: https://pypi.org/project/relaxed/
@@ -38,44 +37,37 @@
 [rtd-link]: https://relaxed.readthedocs.io/en/latest/?badge=latest
 [sk-badge]: https://scikit-hep.org/assets/images/Scikit--HEP-Project-blue.svg
 
-Provides differentiable ("relaxed") versions of common operations in high-energy
-physics.
+Provides differentiable ("relaxed") versions of common operations in high-energy physics.
 
-Based on [`jax`](http://github.com/google/jax). Where possible, function APIs
-try to mimic their commonly used counterparts, e.g. fitting and hypothesis
-testing in [`pyhf`](http://github.com/scikit-hep/pyhf).
+Based on [`jax`](http://github.com/google/jax). Where possible, function APIs try to mimic their
+commonly used counterparts, e.g. fitting and hypothesis testing in
+[`pyhf`](http://github.com/scikit-hep/pyhf).
 
 ## Currently implemented:
 
 - **[basic operations](src/relaxed/ops.py)**:
-  - `relaxed.hist`: histograms via kernel density estimation (tunable
-    bandwidth).
-  - `relaxed.cut`: approximates a hard cut with a sigmoid function (tunable
-    slope).
+  - `relaxed.hist`: histograms via kernel density estimation (tunable bandwidth).
+  - `relaxed.cut`: approximates a hard cut with a sigmoid function (tunable slope).
 - **[fitting routines](src/relaxed/mle.py)**:
   - `relaxed.mle.fit`: global MLE fit.
-  - `relaxed.mle.fixed_poi_fit`: constrained fit given a value of a parameter of
-    interest.
+  - `relaxed.mle.fixed_poi_fit`: constrained fit given a value of a parameter of interest.
 - **[inference](src/relaxed/infer.py)**:
-  - `relaxed.infer.hypotest`: hypothesis test based on the profile likelihood.
-    Supports test statistics for both limit setting (`q`) and discovery (`q_0`).
-  - `relaxed.fisher_info`: the fisher information matrix (of a `pyhf`-type
-    model).
-  - `relaxed.cramer_rao_uncert`: inverts the fisher information matrix to
-    provide uncertainties valid through the
+  - `relaxed.infer.hypotest`: hypothesis test based on the profile likelihood. Supports test
+    statistics for both limit setting (`q`) and discovery (`q_0`).
+  - `relaxed.fisher_info`: the fisher information matrix (of a `pyhf`-type model).
+  - `relaxed.cramer_rao_uncert`: inverts the fisher information matrix to provide uncertainties
+    valid through the
     [Cramér-Rao bound](https://en.wikipedia.org/wiki/Cram%C3%A9r%E2%80%93Rao_bound).
 - **[metrics](src/relaxed/metrics.py)**:
-  - `relaxed.metrics.gaussianity`: an experimental metric that quantifies the
-    mean-squared difference of a likelihood function with respect to its
-    gaussian approximation (covariance calculated using the Cramér-Rao bound
-    above).
-  - `relaxed.metrics.asimov_sig`: easy access to the (single- and multi-bin)
-    stat-only expected significance.
+  - `relaxed.metrics.gaussianity`: an experimental metric that quantifies the mean-squared
+    difference of a likelihood function with respect to its gaussian approximation (covariance
+    calculated using the Cramér-Rao bound above).
+  - `relaxed.metrics.asimov_sig`: easy access to the (single- and multi-bin) stat-only expected
+    significance.
 
 We're maintaining a list of desired differentiable operations in
-[`list_of_operations.md`](list_of_operations.md) (thanks to
-[@cranmer](http://github.com/cranmer)) -- feel free to take inspiration or
-contribute with a PR if there's one you can handle :)
+[`list_of_operations.md`](list_of_operations.md) (thanks to [@cranmer](http://github.com/cranmer))
+-- feel free to take inspiration or contribute with a PR if there's one you can handle :)
 
 ## Install
 
@@ -88,11 +80,10 @@ python3 -m pip install relaxed
 ## Examples
 
 [![Binder](https://mybinder.org/badge_logo.svg)](https://mybinder.org/v2/gh/gradhep/relaxed/main?labpath=examples%2Fcuts.ipynb)
-<- Click here to start playing with our examples straight away (thanks to
-Binder)!
+<- Click here to start playing with our examples straight away (thanks to Binder)!
 
-If you'd rather run the example notebooks locally from `examples/`, you can
-clone the repository, then:
+If you'd rather run the example notebooks locally from `examples/`, you can clone the repository,
+then:
 
 ```
 python3 -m venv venv  # or virtualenv
@@ -103,33 +94,28 @@ cd examples
 pip install -r requirements.txt
 ```
 
-Then launch jupyter through your preferred medium (vscode, jupyterlab, etc.),
-making sure to use this virtual env as your kernel (e.g. you can `pip` install
-and run jupyter lab in this env).
+Then launch jupyter through your preferred medium (vscode, jupyterlab, etc.), making sure to use
+this virtual env as your kernel (e.g. you can `pip` install and run jupyter lab in this env).
 
 ## Sharp bits
 
-For serious use with `pyhf`, e.g. in a
-[`neos`](http://github.com/gradhep/neos)-type workflow, it is temporarily
-recommended to install `pyhf` using a specific branch that is designed to be
+For serious use with `pyhf`, e.g. in a [`neos`](http://github.com/gradhep/neos)-type workflow, it is
+temporarily recommended to install `pyhf` using a specific branch that is designed to be
 differentiable with respect to model construction:
 
 ```
 python3 -m pip install git+http://github.com/scikit-hep/pyhf.git@make_difffable_model_ctor
 ```
 
-We plan to merge this into `pyhf` when it's stable, and will then drop this
-instruction :)
+We plan to merge this into `pyhf` when it's stable, and will then drop this instruction :)
 
 ## Cite
 
-If you use `relaxed`, please cite us! You should be able to do that from the
-github UI (top-right, under 'cite this repository'), but if not, see our
-[Zenodo DOI](https://zenodo.org/badge/latestdoi/264991846) or our
-[`CITATION.cff`](CITATION.cff).
+If you use `relaxed`, please cite us! You should be able to do that from the github UI (top-right,
+under 'cite this repository'), but if not, see our
+[Zenodo DOI](https://zenodo.org/badge/latestdoi/264991846) or our [`CITATION.cff`](CITATION.cff).
 
 ## Acknowledgments
 
-Big thanks to all the developers of the main packages we use (`jax`, `pyhf`,
-`jaxopt`). Thanks also to [@dfm](github.com/user/dfm) for the README header
-inspiration ;)
+Big thanks to all the developers of the main packages we use (`jax`, `pyhf`, `jaxopt`). Thanks also
+to [@dfm](github.com/user/dfm) for the README header inspiration ;)
diff --git a/list_of_operations.md b/list_of_operations.md
index 889f593..0b27de8 100644
--- a/list_of_operations.md
+++ b/list_of_operations.md
@@ -2,17 +2,16 @@
 
 ## Definitely useful with known solution
 
-- **Classification (binning)**: Assigning an event to a bin in a histogram or
-  classifying it as a particular class label is a non-differentiable operation.
-  Multi-class classification is a classic example in machine learning and
-  statistics, and is typically relaxed with a sigmoid or a softmax.
+- **Classification (binning)**: Assigning an event to a bin in a histogram or classifying it as a
+  particular class label is a non-differentiable operation. Multi-class classification is a classic
+  example in machine learning and statistics, and is typically relaxed with a sigmoid or a softmax.
 
   - This was used in INFERNO and neos
-  - Alternatively, one could calculate smooth probability assignments using
-    Kernel Density Estimation or some other kernel based approach
+  - Alternatively, one could calculate smooth probability assignments using Kernel Density
+    Estimation or some other kernel based approach
 
-- **Differentiable ranking and sorting**: Sorting is a fundamental operation.
-  For instance, we typically sort particles by $p_T$.
+- **Differentiable ranking and sorting**: Sorting is a fundamental operation. For instance, we
+  typically sort particles by $p_T$.
 
   - Differentiable Ranks and Sorting using Optimal Transport
     [https://arxiv.org/abs/1905.11885](https://arxiv.org/abs/1905.11885)
@@ -20,76 +19,65 @@
     [https://arxiv.org/abs/2002.08871](https://arxiv.org/abs/2002.08871) and
     [great slides](https://raw.githubusercontent.com/mblondel/mblondel.github.io/9e103aad534d3e2d51a357c72b2485309131e719/talks/mblondel-CIRM-2020-03.pdf)
 
-- **Differentiable clustering (partitions)** We have a set of objects and we
-  would like to cluster or partition them. We can think of this in terms of
-  graph where the nodes are the objects and edges indicate two objects are in
-  the same cluster. We want all objects in the same cluster to be connected and
-  no objects in different clusters to be connected.
+- **Differentiable clustering (partitions)** We have a set of objects and we would like to cluster
+  or partition them. We can think of this in terms of graph where the nodes are the objects and
+  edges indicate two objects are in the same cluster. We want all objects in the same cluster to be
+  connected and no objects in different clusters to be connected.
 
-  - This can be imposed if the adjacency matrix is restricted to be of the form
-    $u u^T$, where $u$ is a softmax output. This was used in
-    [Set2Graph: Learning Graphs From Sets](https://arxiv.org/abs/2002.08772) for
-    vertexing and is also described in slide 27 of
+  - This can be imposed if the adjacency matrix is restricted to be of the form $u u^T$, where $u$
+    is a softmax output. This was used in
+    [Set2Graph: Learning Graphs From Sets](https://arxiv.org/abs/2002.08772) for vertexing and is
+    also described in slide 27 of
     [this talk](https://indico.cern.ch/event/809820/contributions/3632659/attachments/1971659/3280030/GNN_NYU_3_Jan_2020.pdf).
-  - note: one might think of using something like this for clustering
-    calorimeter cells to calorimeter clusters.
-
-- **Barlow-Beeston for Monte Carlo Statistical Uncertainty:** The statistical
-  uncertainty on template histograms from limited statistical uncertainty can be
-  dealth with in a clean way by jointly modelling the statistical fluctuations
-  in the data and the statistical fluctuations in the Monte Carlo samples. This
-  was treated in
-  [Fitting using finite Monte Carlo samples](<https://doi.org/10.1016/0010-4655(93)90005-W>)
-  (pdf from [at FermiLab](https://lss.fnal.gov/archive/other/man-hep-93-1.pdf)).
-  In a simple one-bin example one would model as
-  $P(n,m|\mu,\lambda) = Pois(n|\mu+\lambda)Pois(m|\tau\lambda)$ where $n$ is
-  count in data in a signal region, $\mu$ is the unknown exepected signal rate,
-  $\lambda$ is the unknown expected background rate (a nuisance parameter),
-  $\tau$ is the ratio of the Monte Carlo luminosity to data luminosity, and $m$
-  is the count in the Monte Carlo sample. This can easily be extended to
-  multiple bins and multiple background sources per bin, but it introduces a
-  nuisance parameter for each component of each bin. Note in this setup the
-  observed Monte Carlo are treated as data (since it fluctuates and is on the
-  left of the "|"). In HistFactory language, the Monte Carlo observation $m$
-  would be the `Data` of a new `Channel` and the unknown background
-  $\tau\lambda$ would be modeled with a `ShapeFactor` that would be shared with
-  the `Channel` that has the real observed data $n$. This is typically very
-  heavy and leads to a proliferation of nuisance parameters, which cause
-  problems for Minuit. Thus, typically an approximate approach is used where the
-  different background contributions are combined. In HistFactory this is what
-  is done when using `StatErrorConfig`. This treatment is usually fine, but has
-  corner cases when $m=0$. One interesting aspect of the Barlow-Beeston approach
-  is that optimization on the nuisance parameter $\lambda$ decouples from
-  optimization on $\mu$. In fact, there is a closed form solution for
-  $\hat{\lambda}(n,m,\mu)$ (eq. 14), so optimizing the full likelihood can be
-  thought of as a nested optimization with $\lambda$ in the inner loop.
-  Moreover, it can be thought of as the implicit minimization used for the
-  profile likelihood fit in neos. Several years ago George Lewis wrote a wrapper
-  for the log-likeihood created in HistFactory so that $\lambda$ was solved
-  exactly and only the profiled likelihood with $\mu$ was exposed to Minuit.
-  While elegant conceptually, the implementation in RooFit did not lead to
-  significant performance gains for the number of nuisance parameters in the
-  models at that time. However, it would be interesting to revisit this in the
+  - note: one might think of using something like this for clustering calorimeter cells to
+    calorimeter clusters.
+
+- **Barlow-Beeston for Monte Carlo Statistical Uncertainty:** The statistical uncertainty on
+  template histograms from limited statistical uncertainty can be dealth with in a clean way by
+  jointly modelling the statistical fluctuations in the data and the statistical fluctuations in the
+  Monte Carlo samples. This was treated in
+  [Fitting using finite Monte Carlo samples](<https://doi.org/10.1016/0010-4655(93)90005-W>) (pdf
+  from [at FermiLab](https://lss.fnal.gov/archive/other/man-hep-93-1.pdf)). In a simple one-bin
+  example one would model as $P(n,m|\mu,\lambda) = Pois(n|\mu+\lambda)Pois(m|\tau\lambda)$ where $n$
+  is count in data in a signal region, $\mu$ is the unknown exepected signal rate, $\lambda$ is the
+  unknown expected background rate (a nuisance parameter), $\tau$ is the ratio of the Monte Carlo
+  luminosity to data luminosity, and $m$ is the count in the Monte Carlo sample. This can easily be
+  extended to multiple bins and multiple background sources per bin, but it introduces a nuisance
+  parameter for each component of each bin. Note in this setup the observed Monte Carlo are treated
+  as data (since it fluctuates and is on the left of the "|"). In HistFactory language, the Monte
+  Carlo observation $m$ would be the `Data` of a new `Channel` and the unknown background
+  $\tau\lambda$ would be modeled with a `ShapeFactor` that would be shared with the `Channel` that
+  has the real observed data $n$. This is typically very heavy and leads to a proliferation of
+  nuisance parameters, which cause problems for Minuit. Thus, typically an approximate approach is
+  used where the different background contributions are combined. In HistFactory this is what is
+  done when using `StatErrorConfig`. This treatment is usually fine, but has corner cases when
+  $m=0$. One interesting aspect of the Barlow-Beeston approach is that optimization on the nuisance
+  parameter $\lambda$ decouples from optimization on $\mu$. In fact, there is a closed form solution
+  for $\hat{\lambda}(n,m,\mu)$ (eq. 14), so optimizing the full likelihood can be thought of as a
+  nested optimization with $\lambda$ in the inner loop. Moreover, it can be thought of as the
+  implicit minimization used for the profile likelihood fit in neos. Several years ago George Lewis
+  wrote a wrapper for the log-likeihood created in HistFactory so that $\lambda$ was solved exactly
+  and only the profiled likelihood with $\mu$ was exposed to Minuit. While elegant conceptually, the
+  implementation in RooFit did not lead to significant performance gains for the number of nuisance
+  parameters in the models at that time. However, it would be interesting to revisit this in the
   context of pyhf and grad-hep. References:
 
   - [RooBarlowBeestonLL.cxx](https://root.cern/doc/master/RooBarlowBeestonLL_8cxx_source.html)
     [RooBarlowBeestonLL.h](https://root.cern/doc/master/RooBarlowBeestonLL_8h_source.html)
   - [A RooFit example](https://root.cern/doc/master/rf709__BarlowBeeston_8C.html)
 
-- **ROC AUC:** While the area under ROC curve (ROC AUC) is not usually our
-  ultimate physics goal, it may be useful or motivated in some cases. The ROC
-  curve is non-differentiable, but can be relaxed into a rank statistic. This
-  was used for example in
+- **ROC AUC:** While the area under ROC curve (ROC AUC) is not usually our ultimate physics goal, it
+  may be useful or motivated in some cases. The ROC curve is non-differentiable, but can be relaxed
+  into a rank statistic. This was used for example in
   [Backdrop: Stochastic Backpropagation](https://arxiv.org/abs/1806.01337)
-- Herschtal, A. and Raskutti, B. (2004). Optimising area under the roc curve
-  using gradient descent. In Proceedings of the Twenty-first International
-  Conference on Machine Learning, ICML ’04, pages 49–, New York, NY, USA. ACM.
+- Herschtal, A. and Raskutti, B. (2004). Optimising area under the roc curve using gradient descent.
+  In Proceedings of the Twenty-first International Conference on Machine Learning, ICML ’04, pages
+  49–, New York, NY, USA. ACM.
   [doi/10.1145/1015330.1015366](https://dl.acm.org/doi/10.1145/1015330.1015366)
 
 ## Definitely useful seeking solution
 
-- **Differentiable legend placement in plots:** They are so annoying aren't
-  they?
+- **Differentiable legend placement in plots:** They are so annoying aren't they?
 
 - **Differentiable peer review:** accept/reject is so non-diffable
 
@@ -98,11 +86,10 @@
 - **Differentiable Feature Selection by Discrete Relaxation** See
   [paper](https://www.microsoft.com/en-us/research/publication/differentiable-feature-selection-by-discrete-relaxation/)
 
-- **Gumbel Max Trick & Gumbel Machinery:** The Gumbel-Max Trick is a method to
-  sample from a categorical distribution $Cat(\alpha_1, \dots, \alpha_K)$, where
-  category $k$ has $\alpha_k$ probability to be sampled among $K$ categories,
-  and relies on the Gumbel distribution defined by the Cumulative Distribution
-  Function.
+- **Gumbel Max Trick & Gumbel Machinery:** The Gumbel-Max Trick is a method to sample from a
+  categorical distribution $Cat(\alpha_1, \dots, \alpha_K)$, where category $k$ has $\alpha_k$
+  probability to be sampled among $K$ categories, and relies on the Gumbel distribution defined by
+  the Cumulative Distribution Function.
 
   - [Gumbel Max Trick](https://laurent-dinh.github.io/2016/11/22/gumbel-max.html)
   - [Gumbel Machinery](https://cmaddis.github.io/gumbel-machinery)
@@ -110,9 +97,8 @@
 - **Sparse Structured Prediction:** See paper
   [Differentiable Relaxed Optimization for Sparse Structured Prediction](https://arxiv.org/abs/2001.04437)
 
-- **Coreference resolution**: "Coreference resolution is the task of identifying
-  all mentions which refer to the same entity in a document." "Coreference
-  resolution can be regarded as a clustering problem: each cluster corresponds
-  to a single entity and consists of all its mentions in a given text." From
-  Optimizing Differentiable Relaxations of Coreference Evaluation Metrics
+- **Coreference resolution**: "Coreference resolution is the task of identifying all mentions which
+  refer to the same entity in a document." "Coreference resolution can be regarded as a clustering
+  problem: each cluster corresponds to a single entity and consists of all its mentions in a given
+  text." From Optimizing Differentiable Relaxations of Coreference Evaluation Metrics
   [https://arxiv.org/abs/1704.04451](https://arxiv.org/abs/1704.04451)