From ef56a8fc654f1cafb16748bb9746530ccc81403b Mon Sep 17 00:00:00 2001 From: Matthew Feickert Date: Wed, 7 Apr 2021 08:22:43 -0500 Subject: [PATCH] draft: Add further clarification for introductory notebooks (#3) * Add more explicit titles and give further clarification on model parameters * Add more references to APIs * Use integer observation counts for clarity --- book/HelloWorld.ipynb | 47 +++++++++++++++++------------ book/SerializationAndPatching.ipynb | 2 +- book/SimpleWorkspace.ipynb | 2 +- book/Toys.ipynb | 12 ++++---- book/data/2-bin_1-channel.json | 2 +- 5 files changed, 37 insertions(+), 28 deletions(-) diff --git a/book/HelloWorld.ipynb b/book/HelloWorld.ipynb index 16fd31a..39abf9f 100644 --- a/book/HelloWorld.ipynb +++ b/book/HelloWorld.ipynb @@ -4,7 +4,7 @@ "cell_type": "markdown", "metadata": {}, "source": [ - "# My First Likelihood\n", + "# Introduction to HistFactory Models\n", "\n", "🎶 I'm the very Model of a simple HEP-like measurement... 🎶\n", "\n", @@ -30,7 +30,7 @@ "import matplotlib.pyplot as plt\n", "import numpy as np\n", "import pyhf\n", - "from pyhf.contrib.viz import brazil # not imported by default!" + "from pyhf.contrib.viz import brazil" ] }, { @@ -49,7 +49,7 @@ "cell_type": "markdown", "metadata": {}, "source": [ - "What did we just make? This returns a [`pyhf.Model`](https://pyhf.readthedocs.io/en/v0.6.1/_generated/pyhf.pdf.Model.html#pyhf.pdf.Model) object. Let's check out the specification." + "What did we just make? This returns a [`pyhf.pdf.Model`](https://pyhf.readthedocs.io/en/v0.6.1/_generated/pyhf.pdf.Model.html#pyhf.pdf.Model) object. Let's check out the specification." ] }, { @@ -153,7 +153,7 @@ "cell_type": "markdown", "metadata": {}, "source": [ - "Let's look at the first row here which is our uncorrelated shape modifier. This is a multiplicative modifier denoted by $\\kappa$ per-bin (denoted by $\\gamma_b$). Notice that the input for the constraint term requires $\\sigma_b$ which is the relative uncertainty of that modifier. This is Poisson-constrained by $\\sigma_b^{-2}$. Let's quickly verify by hand to convince ourselves of what's going on here:" + "Let's look at the first row here which is our uncorrelated shape modifier. This is a multiplicative modifier denoted by $\\kappa$ per-bin (denoted by $\\gamma_b$). Notice that the input for the constraint term requires $\\sigma_b$ which is the relative uncertainty of that modifier. This is Poisson-constrained by $\\sigma_b^{-2}$. Let's quickly calculate \"by hand\" the auxiliary data to convince ourselves of what's going on here (remembering that the background uncertainties were 10% and 20% of the observed background counts):" ] }, { @@ -165,6 +165,13 @@ "(np.array([5.0, 12.0]) / np.array([50.0, 60.0])) ** -2" ] }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "which is what we see from the `pyhf.pdf.Model` API" + ] + }, { "cell_type": "code", "execution_count": null, @@ -223,7 +230,7 @@ "cell_type": "markdown", "metadata": {}, "source": [ - "This returns the data for the entire likelihood for the 2 bin model, the main model as well as the constraint (or auxiliary) model. We can also drop the auxdata to get the actual data." + "This returns the expected data given the model parameters for the entire likelihood for the 2 bin model, the main model as well as the constraint (or auxiliary) model. We can also drop the auxdata to get the actual data." ] }, { @@ -475,10 +482,10 @@ "source": [ "## Simple Inference\n", "\n", - "The core of statistical analysis is the statistical model. For inference, it's viewed as a function of the parameters with the data fixed.\n", + "The core of statistical analysis is the statistical model. For inference, it's viewed as a function of the model parameters conditioned on the fixed observations.\n", "\n", "$$\n", - "\\log L(\\theta | x) = \\log p(x | \\theta)\n", + "\\log L(\\theta | x) \\propto \\log p(x | \\theta)\n", "$$\n", "\n", "The value of the likelihood is a float. Let's try it for both the background-only model as well as the signal+background model." @@ -490,7 +497,7 @@ "metadata": {}, "outputs": [], "source": [ - "observations = [52.5, 65.0] + model.config.auxdata # this is a common pattern!\n", + "observations = [53.0, 65.0] + model.config.auxdata # this is a common pattern!\n", "\n", "model.logpdf(pars=bkg_pars, data=observations)" ] @@ -510,7 +517,7 @@ "source": [ "We're not performing inference just yet. We're simply computing the 'logpdf' of the model specified by the parameters $\\theta$ against the provided data. To perform a fit, we use the [inference API](https://pyhf.readthedocs.io/en/v0.6.1/api.html#inference) via `pyhf.infer`.\n", "\n", - "To fit a model to data, we usually want to find the $\\hat{\\theta}$ which refers to the \"Maximum Likelihood Estimate\". This is often referred to mathematically by\n", + "When fitting a model to data, we usually want to find the $\\hat{\\theta}$ which refers to the \"Maximum Likelihood Estimate\" of the model parameters. This is often referred to mathematically by\n", "\n", "$$\n", "\\hat{\\theta}_\\text{MLE} = \\text{argmax}_\\theta L(\\theta | x)\n", @@ -537,8 +544,8 @@ "cell_type": "markdown", "metadata": {}, "source": [ - "So what can we say? With nominal signal `[5, 10]` and nominal background = `[50, 60]`, an observed count of `[52.5, 65]` suggests best fit values:\n", - "* $\\hat{\\mu} \\approx 0.5$,\n", + "So what can we say? With nominal signal `[5, 10]` and nominal background = `[50, 60]` model components, an observed count of `[53, 65]` suggests best fit values:\n", + "* $\\hat{\\mu} \\approx 0.54$,\n", "* $\\hat{\\gamma} \\approx [1,1]$." ] }, @@ -597,10 +604,10 @@ "* $\\hat{\\hat{\\theta}}$ is the best fitted value of the nuisance parameters (for fixed POIs)\n", "* $\\hat{\\psi}$ and $\\hat{\\theta}$ are the best fitted values in a global fit\n", "\n", - "So let's run a hypothesis test for\n", + "So let's run a limit setting (exclusion) hypothesis test for\n", "\n", - "* null hypothesis ($\\mu = 1$) — \"SUSY is real\"\n", - "* alternate hypothesis ($\\mu = 0$) — \"Standard Model explains it all\"" + "* null hypothesis ($\\mu = 1$) — \"BSM physics process exists\"\n", + "* alternate hypothesis ($\\mu = 0$) — \"Standard Model only physics\"" ] }, { @@ -611,7 +618,7 @@ "source": [ "CLs_obs, CLs_exp = pyhf.infer.hypotest(\n", " 1.0, # null hypothesis\n", - " [52.5, 65.0] + model.config.auxdata,\n", + " [53.0, 65.0] + model.config.auxdata,\n", " model,\n", " test_stat=\"q\",\n", " return_expected_set=True,\n", @@ -652,7 +659,7 @@ "source": [ "CLs_obs, CLs_exp = pyhf.infer.hypotest(\n", " 1.0, # null hypothesis\n", - " [52.5, 65.0] + model.config.auxdata,\n", + " [53.0, 65.0] + model.config.auxdata,\n", " model,\n", " test_stat=\"qtilde\",\n", " return_expected_set=True,\n", @@ -691,7 +698,9 @@ "cell_type": "markdown", "metadata": {}, "source": [ - "We can plot the standard \"Brazil band\" of the observed and expected $\\text{CL}_\\text{s}$ values using the `pyhf.contrib` module (which needs `pyhf[contrib]`):" + "We can plot the standard \"Brazil band\" of the observed and expected $\\text{CL}_\\text{s}$ values using the `pyhf.contrib` module (which needs `pyhf[contrib]`):\n", + "\n", + "The horiztonal red line indicates the test size ($\\alpha=0.05$), whose intersection with the $\\text{CL}_\\text{s}$ lines visually represents the $(1-\\alpha)\\%$ CL limit on the POI." ] }, { @@ -711,7 +720,7 @@ "cell_type": "markdown", "metadata": {}, "source": [ - "Note that if you wnated to do all of this \"by hand\" you still could pretty easily with the `pyhf` APIs" + "Note that if you wanted to do all of this \"by hand\" you still could pretty easily. The `pyhf.infer.intervals.upperlimit` API just makes it easier." ] }, { @@ -732,7 +741,7 @@ " for poi_value in poi_values\n", "]\n", "\n", - "# Calculate upper limit\n", + "# Calculate upper limit through interpolation\n", "observed = np.asarray([h[0] for h in results]).ravel()\n", "expected = np.asarray([h[1][2] for h in results]).ravel()\n", "print(f\"Upper limit (obs): μ = {np.interp(0.05, observed[::-1], poi_values[::-1]):.4f}\")\n", diff --git a/book/SerializationAndPatching.ipynb b/book/SerializationAndPatching.ipynb index b255f52..7eff588 100644 --- a/book/SerializationAndPatching.ipynb +++ b/book/SerializationAndPatching.ipynb @@ -30,7 +30,7 @@ "cell_type": "markdown", "metadata": {}, "source": [ - "As of this tutorial, ATLAS has [published 5 full likelihoods to HEPData](https://pyhf.readthedocs.io/en/v0.6.1/citations.html#published-likelihoods)\n", + "As of this tutorial, ATLAS has [published 7 full likelihoods to HEPData](https://pyhf.readthedocs.io/en/v0.6.1/citations.html#published-likelihoods)\n", "\n", "

\n", "\n", diff --git a/book/SimpleWorkspace.ipynb b/book/SimpleWorkspace.ipynb index 1734341..ffd5214 100644 --- a/book/SimpleWorkspace.ipynb +++ b/book/SimpleWorkspace.ipynb @@ -4,7 +4,7 @@ "cell_type": "markdown", "metadata": {}, "source": [ - "# Workspace World\n", + "# Introduction to Workspaces\n", "\n", "Similarly to the previous chapter, we're going to go up \"one level\" from models to workspaces." ] diff --git a/book/Toys.ipynb b/book/Toys.ipynb index 2e3b11f..eb51466 100644 --- a/book/Toys.ipynb +++ b/book/Toys.ipynb @@ -6,7 +6,7 @@ "source": [ "# Playing with Toys\n", "\n", - " As of `v0.6.1`, `pyhf` now supports toys! A lot of kinks have been discovered and worked out and we're grateful to our ATLAS colleagues for beta-testing this in the meantime. We don't believe that there may not be any more bugs, but we feel confident that we can release the current implementation." + "As of `v0.6.0`, `pyhf` now supports toys! A lot of kinks have been discovered and worked out and we're grateful to our ATLAS colleagues for beta-testing this in the meantime. We don't believe that there may not be any more bugs, but we feel confident that we can release the current implementation." ] }, { @@ -47,7 +47,7 @@ "source": [ "CLs_obs, CLs_exp = pyhf.infer.hypotest(\n", " 1.0, # null hypothesis\n", - " [52.5, 65.0] + model.config.auxdata,\n", + " [53.0, 65.0] + model.config.auxdata,\n", " model,\n", " test_stat=\"qtilde\",\n", " return_expected_set=True,\n", @@ -72,7 +72,7 @@ "source": [ "CLs_obs, CLs_exp = pyhf.infer.hypotest(\n", " 1.0, # null hypothesis\n", - " [52.5, 65.0] + model.config.auxdata,\n", + " [53.0, 65.0] + model.config.auxdata,\n", " model,\n", " test_stat=\"qtilde\",\n", " return_expected_set=True,\n", @@ -126,7 +126,7 @@ "source": [ "CLs_obs, CLs_exp = pyhf.infer.hypotest(\n", " 1.0, # null hypothesis\n", - " [5.25, 6.5] + model.config.auxdata,\n", + " [5.0, 7.0] + model.config.auxdata,\n", " model,\n", " test_stat=\"qtilde\",\n", " return_expected_set=True,\n", @@ -152,7 +152,7 @@ "source": [ "CLs_obs, CLs_exp = pyhf.infer.hypotest(\n", " 1.0, # null hypothesis\n", - " [5.25, 6.5] + model.config.auxdata,\n", + " [5.0, 7.0] + model.config.auxdata,\n", " model,\n", " test_stat=\"qtilde\",\n", " return_expected_set=True,\n", @@ -188,7 +188,7 @@ "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", - "version": "3.8.6" + "version": "3.8.7" } }, "nbformat": 4, diff --git a/book/data/2-bin_1-channel.json b/book/data/2-bin_1-channel.json index 69ce538..bd77e22 100644 --- a/book/data/2-bin_1-channel.json +++ b/book/data/2-bin_1-channel.json @@ -14,7 +14,7 @@ } ], "observations": [ - { "name": "singlechannel", "data": [52.5, 65.0] } + { "name": "singlechannel", "data": [53.0, 65.0] } ], "measurements": [ { "name": "Measurement", "config": {"poi": "mu", "parameters": []} }