From 16315e75c00365cab0a881b96c558dc7a91c27b8 Mon Sep 17 00:00:00 2001 From: Nathan Simpson Date: Tue, 4 Jul 2023 19:33:32 +0100 Subject: [PATCH] prettier --- .github/CONTRIBUTING.md | 40 +++++------- README.md | 80 ++++++++++------------- list_of_operations.md | 136 ++++++++++++++++++---------------------- 3 files changed, 111 insertions(+), 145 deletions(-) diff --git a/.github/CONTRIBUTING.md b/.github/CONTRIBUTING.md index 4ff1b8a..c415f28 100644 --- a/.github/CONTRIBUTING.md +++ b/.github/CONTRIBUTING.md @@ -1,20 +1,18 @@ -See the [Scientific Python Developer Guide][spc-dev-intro] for a detailed -description of best practices for developing scientific packages. +See the [Scientific Python Developer Guide][spc-dev-intro] for a detailed description of best +practices for developing scientific packages. [spc-dev-intro]: https://scientific-python-cookie.readthedocs.io/guide/intro # Quick development -The fastest way to start with development is to use nox. If you don't have nox, -you can use `pipx run nox` to run it without installing, or `pipx install nox`. -If you don't have pipx (pip for applications), then you can install with with -`pip install pipx` (the only case were installing an application with regular -pip is reasonable). If you use macOS, then pipx and nox are both in brew, use -`brew install pipx nox`. +The fastest way to start with development is to use nox. If you don't have nox, you can use +`pipx run nox` to run it without installing, or `pipx install nox`. If you don't have pipx (pip for +applications), then you can install with with `pip install pipx` (the only case were installing an +application with regular pip is reasonable). If you use macOS, then pipx and nox are both in brew, +use `brew install pipx nox`. -To use, run `nox`. This will lint and test using every installed version of -Python on your system, skipping ones that are not installed. You can also run -specific jobs: +To use, run `nox`. This will lint and test using every installed version of Python on your system, +skipping ones that are not installed. You can also run specific jobs: ```console $ nox -s lint # Lint only @@ -23,8 +21,7 @@ $ nox -s docs -- serve # Build and serve the docs $ nox -s build # Make an SDist and wheel ``` -Nox handles everything for you, including setting up an temporary virtual -environment for each run. +Nox handles everything for you, including setting up an temporary virtual environment for each run. # Setting up a development environment manually @@ -36,9 +33,8 @@ source ./.venv/bin/activate pip install -v -e .[dev] ``` -If you have the -[Python Launcher for Unix](https://github.com/brettcannon/python-launcher), you -can instead do: +If you have the [Python Launcher for Unix](https://github.com/brettcannon/python-launcher), you can +instead do: ```bash py -m venv .venv @@ -47,16 +43,15 @@ py -m install -v -e .[dev] # Post setup -You should prepare pre-commit, which will help you by checking that commits pass -required checks: +You should prepare pre-commit, which will help you by checking that commits pass required checks: ```bash pip install pre-commit # or brew install pre-commit on macOS pre-commit install # Will install a pre-commit hook into the git repo ``` -You can also/alternatively run `pre-commit run` (changes only) or -`pre-commit run --all-files` to check even without installing the hook. +You can also/alternatively run `pre-commit run` (changes only) or `pre-commit run --all-files` to +check even without installing the hook. # Testing @@ -90,9 +85,8 @@ nox -s docs -- serve # Pre-commit -This project uses pre-commit for all style checking. While you can run it with -nox, this is such an important tool that it deserves to be installed on its own. -Install pre-commit and run: +This project uses pre-commit for all style checking. While you can run it with nox, this is such an +important tool that it deserves to be installed on its own. Install pre-commit and run: ```bash pre-commit run -a diff --git a/README.md b/README.md index 589eff4..c1ca583 100644 --- a/README.md +++ b/README.md @@ -27,8 +27,7 @@ [github-discussions-badge]: https://img.shields.io/static/v1?label=Discussions&message=Ask&color=blue&logo=github [github-discussions-link]: https://github.com/gradhep/relaxed/discussions -[gitter-badge]: - https://badges.gitter.im/https://github.com/gradhep/relaxed/community.svg +[gitter-badge]: https://badges.gitter.im/https://github.com/gradhep/relaxed/community.svg [gitter-link]: https://gitter.im/https://github.com/gradhep/relaxed/community?utm_source=badge&utm_medium=badge&utm_campaign=pr-badge [pypi-link]: https://pypi.org/project/relaxed/ @@ -38,44 +37,37 @@ [rtd-link]: https://relaxed.readthedocs.io/en/latest/?badge=latest [sk-badge]: https://scikit-hep.org/assets/images/Scikit--HEP-Project-blue.svg -Provides differentiable ("relaxed") versions of common operations in high-energy -physics. +Provides differentiable ("relaxed") versions of common operations in high-energy physics. -Based on [`jax`](http://github.com/google/jax). Where possible, function APIs -try to mimic their commonly used counterparts, e.g. fitting and hypothesis -testing in [`pyhf`](http://github.com/scikit-hep/pyhf). +Based on [`jax`](http://github.com/google/jax). Where possible, function APIs try to mimic their +commonly used counterparts, e.g. fitting and hypothesis testing in +[`pyhf`](http://github.com/scikit-hep/pyhf). ## Currently implemented: - **[basic operations](src/relaxed/ops.py)**: - - `relaxed.hist`: histograms via kernel density estimation (tunable - bandwidth). - - `relaxed.cut`: approximates a hard cut with a sigmoid function (tunable - slope). + - `relaxed.hist`: histograms via kernel density estimation (tunable bandwidth). + - `relaxed.cut`: approximates a hard cut with a sigmoid function (tunable slope). - **[fitting routines](src/relaxed/mle.py)**: - `relaxed.mle.fit`: global MLE fit. - - `relaxed.mle.fixed_poi_fit`: constrained fit given a value of a parameter of - interest. + - `relaxed.mle.fixed_poi_fit`: constrained fit given a value of a parameter of interest. - **[inference](src/relaxed/infer.py)**: - - `relaxed.infer.hypotest`: hypothesis test based on the profile likelihood. - Supports test statistics for both limit setting (`q`) and discovery (`q_0`). - - `relaxed.fisher_info`: the fisher information matrix (of a `pyhf`-type - model). - - `relaxed.cramer_rao_uncert`: inverts the fisher information matrix to - provide uncertainties valid through the + - `relaxed.infer.hypotest`: hypothesis test based on the profile likelihood. Supports test + statistics for both limit setting (`q`) and discovery (`q_0`). + - `relaxed.fisher_info`: the fisher information matrix (of a `pyhf`-type model). + - `relaxed.cramer_rao_uncert`: inverts the fisher information matrix to provide uncertainties + valid through the [Cramér-Rao bound](https://en.wikipedia.org/wiki/Cram%C3%A9r%E2%80%93Rao_bound). - **[metrics](src/relaxed/metrics.py)**: - - `relaxed.metrics.gaussianity`: an experimental metric that quantifies the - mean-squared difference of a likelihood function with respect to its - gaussian approximation (covariance calculated using the Cramér-Rao bound - above). - - `relaxed.metrics.asimov_sig`: easy access to the (single- and multi-bin) - stat-only expected significance. + - `relaxed.metrics.gaussianity`: an experimental metric that quantifies the mean-squared + difference of a likelihood function with respect to its gaussian approximation (covariance + calculated using the Cramér-Rao bound above). + - `relaxed.metrics.asimov_sig`: easy access to the (single- and multi-bin) stat-only expected + significance. We're maintaining a list of desired differentiable operations in -[`list_of_operations.md`](list_of_operations.md) (thanks to -[@cranmer](http://github.com/cranmer)) -- feel free to take inspiration or -contribute with a PR if there's one you can handle :) +[`list_of_operations.md`](list_of_operations.md) (thanks to [@cranmer](http://github.com/cranmer)) +-- feel free to take inspiration or contribute with a PR if there's one you can handle :) ## Install @@ -88,11 +80,10 @@ python3 -m pip install relaxed ## Examples [![Binder](https://mybinder.org/badge_logo.svg)](https://mybinder.org/v2/gh/gradhep/relaxed/main?labpath=examples%2Fcuts.ipynb) -<- Click here to start playing with our examples straight away (thanks to -Binder)! +<- Click here to start playing with our examples straight away (thanks to Binder)! -If you'd rather run the example notebooks locally from `examples/`, you can -clone the repository, then: +If you'd rather run the example notebooks locally from `examples/`, you can clone the repository, +then: ``` python3 -m venv venv # or virtualenv @@ -103,33 +94,28 @@ cd examples pip install -r requirements.txt ``` -Then launch jupyter through your preferred medium (vscode, jupyterlab, etc.), -making sure to use this virtual env as your kernel (e.g. you can `pip` install -and run jupyter lab in this env). +Then launch jupyter through your preferred medium (vscode, jupyterlab, etc.), making sure to use +this virtual env as your kernel (e.g. you can `pip` install and run jupyter lab in this env). ## Sharp bits -For serious use with `pyhf`, e.g. in a -[`neos`](http://github.com/gradhep/neos)-type workflow, it is temporarily -recommended to install `pyhf` using a specific branch that is designed to be +For serious use with `pyhf`, e.g. in a [`neos`](http://github.com/gradhep/neos)-type workflow, it is +temporarily recommended to install `pyhf` using a specific branch that is designed to be differentiable with respect to model construction: ``` python3 -m pip install git+http://github.com/scikit-hep/pyhf.git@make_difffable_model_ctor ``` -We plan to merge this into `pyhf` when it's stable, and will then drop this -instruction :) +We plan to merge this into `pyhf` when it's stable, and will then drop this instruction :) ## Cite -If you use `relaxed`, please cite us! You should be able to do that from the -github UI (top-right, under 'cite this repository'), but if not, see our -[Zenodo DOI](https://zenodo.org/badge/latestdoi/264991846) or our -[`CITATION.cff`](CITATION.cff). +If you use `relaxed`, please cite us! You should be able to do that from the github UI (top-right, +under 'cite this repository'), but if not, see our +[Zenodo DOI](https://zenodo.org/badge/latestdoi/264991846) or our [`CITATION.cff`](CITATION.cff). ## Acknowledgments -Big thanks to all the developers of the main packages we use (`jax`, `pyhf`, -`jaxopt`). Thanks also to [@dfm](github.com/user/dfm) for the README header -inspiration ;) +Big thanks to all the developers of the main packages we use (`jax`, `pyhf`, `jaxopt`). Thanks also +to [@dfm](github.com/user/dfm) for the README header inspiration ;) diff --git a/list_of_operations.md b/list_of_operations.md index 889f593..0b27de8 100644 --- a/list_of_operations.md +++ b/list_of_operations.md @@ -2,17 +2,16 @@ ## Definitely useful with known solution -- **Classification (binning)**: Assigning an event to a bin in a histogram or - classifying it as a particular class label is a non-differentiable operation. - Multi-class classification is a classic example in machine learning and - statistics, and is typically relaxed with a sigmoid or a softmax. +- **Classification (binning)**: Assigning an event to a bin in a histogram or classifying it as a + particular class label is a non-differentiable operation. Multi-class classification is a classic + example in machine learning and statistics, and is typically relaxed with a sigmoid or a softmax. - This was used in INFERNO and neos - - Alternatively, one could calculate smooth probability assignments using - Kernel Density Estimation or some other kernel based approach + - Alternatively, one could calculate smooth probability assignments using Kernel Density + Estimation or some other kernel based approach -- **Differentiable ranking and sorting**: Sorting is a fundamental operation. - For instance, we typically sort particles by $p_T$. +- **Differentiable ranking and sorting**: Sorting is a fundamental operation. For instance, we + typically sort particles by $p_T$. - Differentiable Ranks and Sorting using Optimal Transport [https://arxiv.org/abs/1905.11885](https://arxiv.org/abs/1905.11885) @@ -20,76 +19,65 @@ [https://arxiv.org/abs/2002.08871](https://arxiv.org/abs/2002.08871) and [great slides](https://raw.githubusercontent.com/mblondel/mblondel.github.io/9e103aad534d3e2d51a357c72b2485309131e719/talks/mblondel-CIRM-2020-03.pdf) -- **Differentiable clustering (partitions)** We have a set of objects and we - would like to cluster or partition them. We can think of this in terms of - graph where the nodes are the objects and edges indicate two objects are in - the same cluster. We want all objects in the same cluster to be connected and - no objects in different clusters to be connected. +- **Differentiable clustering (partitions)** We have a set of objects and we would like to cluster + or partition them. We can think of this in terms of graph where the nodes are the objects and + edges indicate two objects are in the same cluster. We want all objects in the same cluster to be + connected and no objects in different clusters to be connected. - - This can be imposed if the adjacency matrix is restricted to be of the form - $u u^T$, where $u$ is a softmax output. This was used in - [Set2Graph: Learning Graphs From Sets](https://arxiv.org/abs/2002.08772) for - vertexing and is also described in slide 27 of + - This can be imposed if the adjacency matrix is restricted to be of the form $u u^T$, where $u$ + is a softmax output. This was used in + [Set2Graph: Learning Graphs From Sets](https://arxiv.org/abs/2002.08772) for vertexing and is + also described in slide 27 of [this talk](https://indico.cern.ch/event/809820/contributions/3632659/attachments/1971659/3280030/GNN_NYU_3_Jan_2020.pdf). - - note: one might think of using something like this for clustering - calorimeter cells to calorimeter clusters. - -- **Barlow-Beeston for Monte Carlo Statistical Uncertainty:** The statistical - uncertainty on template histograms from limited statistical uncertainty can be - dealth with in a clean way by jointly modelling the statistical fluctuations - in the data and the statistical fluctuations in the Monte Carlo samples. This - was treated in - [Fitting using finite Monte Carlo samples]() - (pdf from [at FermiLab](https://lss.fnal.gov/archive/other/man-hep-93-1.pdf)). - In a simple one-bin example one would model as - $P(n,m|\mu,\lambda) = Pois(n|\mu+\lambda)Pois(m|\tau\lambda)$ where $n$ is - count in data in a signal region, $\mu$ is the unknown exepected signal rate, - $\lambda$ is the unknown expected background rate (a nuisance parameter), - $\tau$ is the ratio of the Monte Carlo luminosity to data luminosity, and $m$ - is the count in the Monte Carlo sample. This can easily be extended to - multiple bins and multiple background sources per bin, but it introduces a - nuisance parameter for each component of each bin. Note in this setup the - observed Monte Carlo are treated as data (since it fluctuates and is on the - left of the "|"). In HistFactory language, the Monte Carlo observation $m$ - would be the `Data` of a new `Channel` and the unknown background - $\tau\lambda$ would be modeled with a `ShapeFactor` that would be shared with - the `Channel` that has the real observed data $n$. This is typically very - heavy and leads to a proliferation of nuisance parameters, which cause - problems for Minuit. Thus, typically an approximate approach is used where the - different background contributions are combined. In HistFactory this is what - is done when using `StatErrorConfig`. This treatment is usually fine, but has - corner cases when $m=0$. One interesting aspect of the Barlow-Beeston approach - is that optimization on the nuisance parameter $\lambda$ decouples from - optimization on $\mu$. In fact, there is a closed form solution for - $\hat{\lambda}(n,m,\mu)$ (eq. 14), so optimizing the full likelihood can be - thought of as a nested optimization with $\lambda$ in the inner loop. - Moreover, it can be thought of as the implicit minimization used for the - profile likelihood fit in neos. Several years ago George Lewis wrote a wrapper - for the log-likeihood created in HistFactory so that $\lambda$ was solved - exactly and only the profiled likelihood with $\mu$ was exposed to Minuit. - While elegant conceptually, the implementation in RooFit did not lead to - significant performance gains for the number of nuisance parameters in the - models at that time. However, it would be interesting to revisit this in the + - note: one might think of using something like this for clustering calorimeter cells to + calorimeter clusters. + +- **Barlow-Beeston for Monte Carlo Statistical Uncertainty:** The statistical uncertainty on + template histograms from limited statistical uncertainty can be dealth with in a clean way by + jointly modelling the statistical fluctuations in the data and the statistical fluctuations in the + Monte Carlo samples. This was treated in + [Fitting using finite Monte Carlo samples]() (pdf + from [at FermiLab](https://lss.fnal.gov/archive/other/man-hep-93-1.pdf)). In a simple one-bin + example one would model as $P(n,m|\mu,\lambda) = Pois(n|\mu+\lambda)Pois(m|\tau\lambda)$ where $n$ + is count in data in a signal region, $\mu$ is the unknown exepected signal rate, $\lambda$ is the + unknown expected background rate (a nuisance parameter), $\tau$ is the ratio of the Monte Carlo + luminosity to data luminosity, and $m$ is the count in the Monte Carlo sample. This can easily be + extended to multiple bins and multiple background sources per bin, but it introduces a nuisance + parameter for each component of each bin. Note in this setup the observed Monte Carlo are treated + as data (since it fluctuates and is on the left of the "|"). In HistFactory language, the Monte + Carlo observation $m$ would be the `Data` of a new `Channel` and the unknown background + $\tau\lambda$ would be modeled with a `ShapeFactor` that would be shared with the `Channel` that + has the real observed data $n$. This is typically very heavy and leads to a proliferation of + nuisance parameters, which cause problems for Minuit. Thus, typically an approximate approach is + used where the different background contributions are combined. In HistFactory this is what is + done when using `StatErrorConfig`. This treatment is usually fine, but has corner cases when + $m=0$. One interesting aspect of the Barlow-Beeston approach is that optimization on the nuisance + parameter $\lambda$ decouples from optimization on $\mu$. In fact, there is a closed form solution + for $\hat{\lambda}(n,m,\mu)$ (eq. 14), so optimizing the full likelihood can be thought of as a + nested optimization with $\lambda$ in the inner loop. Moreover, it can be thought of as the + implicit minimization used for the profile likelihood fit in neos. Several years ago George Lewis + wrote a wrapper for the log-likeihood created in HistFactory so that $\lambda$ was solved exactly + and only the profiled likelihood with $\mu$ was exposed to Minuit. While elegant conceptually, the + implementation in RooFit did not lead to significant performance gains for the number of nuisance + parameters in the models at that time. However, it would be interesting to revisit this in the context of pyhf and grad-hep. References: - [RooBarlowBeestonLL.cxx](https://root.cern/doc/master/RooBarlowBeestonLL_8cxx_source.html) [RooBarlowBeestonLL.h](https://root.cern/doc/master/RooBarlowBeestonLL_8h_source.html) - [A RooFit example](https://root.cern/doc/master/rf709__BarlowBeeston_8C.html) -- **ROC AUC:** While the area under ROC curve (ROC AUC) is not usually our - ultimate physics goal, it may be useful or motivated in some cases. The ROC - curve is non-differentiable, but can be relaxed into a rank statistic. This - was used for example in +- **ROC AUC:** While the area under ROC curve (ROC AUC) is not usually our ultimate physics goal, it + may be useful or motivated in some cases. The ROC curve is non-differentiable, but can be relaxed + into a rank statistic. This was used for example in [Backdrop: Stochastic Backpropagation](https://arxiv.org/abs/1806.01337) -- Herschtal, A. and Raskutti, B. (2004). Optimising area under the roc curve - using gradient descent. In Proceedings of the Twenty-first International - Conference on Machine Learning, ICML ’04, pages 49–, New York, NY, USA. ACM. +- Herschtal, A. and Raskutti, B. (2004). Optimising area under the roc curve using gradient descent. + In Proceedings of the Twenty-first International Conference on Machine Learning, ICML ’04, pages + 49–, New York, NY, USA. ACM. [doi/10.1145/1015330.1015366](https://dl.acm.org/doi/10.1145/1015330.1015366) ## Definitely useful seeking solution -- **Differentiable legend placement in plots:** They are so annoying aren't - they? +- **Differentiable legend placement in plots:** They are so annoying aren't they? - **Differentiable peer review:** accept/reject is so non-diffable @@ -98,11 +86,10 @@ - **Differentiable Feature Selection by Discrete Relaxation** See [paper](https://www.microsoft.com/en-us/research/publication/differentiable-feature-selection-by-discrete-relaxation/) -- **Gumbel Max Trick & Gumbel Machinery:** The Gumbel-Max Trick is a method to - sample from a categorical distribution $Cat(\alpha_1, \dots, \alpha_K)$, where - category $k$ has $\alpha_k$ probability to be sampled among $K$ categories, - and relies on the Gumbel distribution defined by the Cumulative Distribution - Function. +- **Gumbel Max Trick & Gumbel Machinery:** The Gumbel-Max Trick is a method to sample from a + categorical distribution $Cat(\alpha_1, \dots, \alpha_K)$, where category $k$ has $\alpha_k$ + probability to be sampled among $K$ categories, and relies on the Gumbel distribution defined by + the Cumulative Distribution Function. - [Gumbel Max Trick](https://laurent-dinh.github.io/2016/11/22/gumbel-max.html) - [Gumbel Machinery](https://cmaddis.github.io/gumbel-machinery) @@ -110,9 +97,8 @@ - **Sparse Structured Prediction:** See paper [Differentiable Relaxed Optimization for Sparse Structured Prediction](https://arxiv.org/abs/2001.04437) -- **Coreference resolution**: "Coreference resolution is the task of identifying - all mentions which refer to the same entity in a document." "Coreference - resolution can be regarded as a clustering problem: each cluster corresponds - to a single entity and consists of all its mentions in a given text." From - Optimizing Differentiable Relaxations of Coreference Evaluation Metrics +- **Coreference resolution**: "Coreference resolution is the task of identifying all mentions which + refer to the same entity in a document." "Coreference resolution can be regarded as a clustering + problem: each cluster corresponds to a single entity and consists of all its mentions in a given + text." From Optimizing Differentiable Relaxations of Coreference Evaluation Metrics [https://arxiv.org/abs/1704.04451](https://arxiv.org/abs/1704.04451)