You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
\item The second-smallest $\Delta$R between a photon and a jet in the event
92
92
\end{itemize}
93
93
94
-
The linear correlations between these variables in $ttH$ CP even and CP odd MadGraph5\_aMC@NLO+Pythia8 Monte Carlo are shown in Figure \ref{fig:lepcorr4vec}. Figures \ref{fig:lep4vecvbls1} - \ref{fig:lep4vecvbls6} compare the distribution of each training variable in $ttH$ CP even and CP odd Monte Carlo.
94
+
The linear correlations between these variables in $ttH$ CP even and CP odd MadGraph5 aMCNLO+Pythia8 Monte Carlo are shown in Figure \ref{fig:lepcorr4vec}. Figures \ref{fig:lep4vecvbls1} - \ref{fig:lep4vecvbls6} compare the distribution of each training variable in $ttH$ CP even and CP odd Monte Carlo.
\caption{Contribution of STXS truth bins to each analysis category in total event yield. The top row corresponds to the value of $S_{90}/(S_{90} + B_{90})$ in each category, where $S_{90}$ and $B_{90}$ are respectively the total number of signal (including all STXS regions) and background events expected in the smallest $m_{\gamma\gamma}$ range containing 90\% of the signal yield. Other entries correspond to the percentage contribution of a given STXS truth bin to the Higgs signal yield in each analysis category. Entries for the STXS regions targeted by each analysis category are outlined in black if this value is above 15\%. }
\caption{Spurious signal means and widths for the three test functional-form distributions for a range of different template statistics.}
155
154
\label{tab:NoSigSS}
156
-
\end{table}
157
-
\end{landscape}
155
+
\end{sidewaystable}
158
156
159
157
To determine how to reduce the bias further, we note a further set of tests performed in an earlier iteration of this method \cite{Hyneman}, evaluating the difference in GP fit bias when different functional priors were used as the GP mean. Templates were constructed for several statistics regimes using power law (Fig.~\ref{fig:prior_bias_powerlaw}), ExpPoly2 (Fig.~\ref{fig:prior_bias_exppoly2}), and Bernstein 5 (Fig.~\ref{fig:prior_bias_bern5}) functions as the template basis; the possible choices of GP mean tested for each template were an exponential function, a linear function, and a flat line. In the tested templates with more than 10 effective MC events per bin, the choice of GP mean does not seem to affect the GP fit behavior significantly, though some fitting bias is observed in the lower-statistics templates. The unit of the y-axis in these plots is the percentage disagreement between the smoothed and the unsmoothed template, similar to a ratio plot.
\caption{Spurious signal means and widths for the three test functional-form distributions for a range of different template statistics.}
318
315
\label{tab:NoSigSSpadded}
319
-
\end{table}
320
-
\end{landscape}
316
+
\end{sidewaystable}
321
317
322
318
\subsection{Extended Templates, Linear Error Kernel}
323
319
@@ -406,8 +402,7 @@ \subsection{Extended Templates, Linear Error Kernel}
406
402
407
403
We report the results of the padded-template, linear-error kernel bias study for all categories in the \Tab{\ref{tab:NoSigSSlinear}}.
408
404
409
-
\begin{landscape}
410
-
\begin{table}
405
+
\begin{sidewaystable}
411
406
\centering
412
407
\resizebox{\linewidth}{!}{
413
408
\begin{tabular}{lcSS
@@ -454,17 +449,15 @@ \subsection{Extended Templates, Linear Error Kernel}
454
449
}
455
450
\caption{Spurious signal means and widths for the three test functional-form distributions for a range of different template statistics.}
456
451
\label{tab:NoSigSSlinear}
457
-
\end{table}
458
-
\end{landscape}
452
+
\end{sidewaystable}
459
453
460
454
To further validate the choice of 20 effective events per bin as the cutoff, we investigate some edge cases. Since the templates have 130 bins (due to the 10 bin padding on either side), we note that the 1000-event templates have just over 7 effective events per bin, while the 10000 event templates have just over 76 events per bin. We test templates with exactly 10 effective events per bin (1300 total events), slightly more than 10 effective events per bin (1400 total events), exactly 20 effective events per bin (2600 total events), and slightly more than 21 effective events per bin (2800 total events).
461
455
462
456
We note that, in the 10 event/bin regime, for templates generated with ExpPoly2 and ExpPoly3, we see no bias when fitting with ExpPoly2 and ExpPoly3, but see a bias of roughly 35\% of the statistical uncertainty on the spurious signal when fitting with lower degree-of-freedom templates (i.e., Exponential and Powerlaw). Upon closer examination, we observe that this is due to the presence of more substantial edge effects in the low-mass region of this very low statistics category that cannot be appropriately modelled by the Gaussian Process.
463
457
464
458
However, by requiring at least 20 effective events per bin, we observe that this bias is reduced to less than or equal to 20\% of the statistical uncertainty on the spurious signal. At the low-statistics end of this range, however, we note that the statistical uncertainty is expected to dominate (that is, spurious signal will not be a significant uncertainty), so we can conclude that the effects of the GPR bias will be minimal. We further note that, as statistics increase past 75 effective events per bin, bias drops off to less than 10\% of the spurious signal uncertainty- in regimes where the spurious signal uncertainty is expected to dominate, the bias is found to be negligible.
465
459
466
-
\begin{landscape}
467
-
\begin{table}
460
+
\begin{sidewaystable}
468
461
\centering
469
462
\resizebox{\linewidth}{!}{
470
463
\begin{tabular}{lcSS
@@ -506,12 +499,10 @@ \subsection{Extended Templates, Linear Error Kernel}
506
499
}
507
500
\caption{Spurious signal means and widths for all choices of fit functional-form, using the "low" template with the ExpPoly2 generating functional form, for a range of different template statistics.}
508
501
\label{tab:NoSigSSedges1}
509
-
\end{table}
510
-
\end{landscape}
502
+
\end{sidewaystable}
511
503
512
504
513
-
\begin{landscape}
514
-
\begin{table}
505
+
\begin{sidewaystable}
515
506
\centering
516
507
\resizebox{\linewidth}{!}{
517
508
\begin{tabular}{lcSS
@@ -559,8 +550,7 @@ \subsection{Extended Templates, Linear Error Kernel}
559
550
}
560
551
\caption{Spurious signal means and widths for all choices of fit functional-form, using the "medium" template with the ExpPoly3 generating functional form and the "high" template with the ExpPoly3 generating functional form, for a range of different template statistics.}
\caption{Spurious signal means and widths for the three test functional-form distributions for a range of different template statistics, with a signal feature injection that is approximately 3 GeV wide and 1\% of the template integral.}
781
770
\label{tab:SigSS}
782
-
\end{table}
783
-
\end{landscape}
771
+
\end{sidewaystable}
784
772
785
773
The feature-injection bias for a three-sigma feature does not change appreciably with template shape (that is, for a given statistics level, the bias is approximately the same for all three templates). However, at high stats, the bias / feature size drops off as a function of template statistics. This makes sense, as the presence of true underlying features in high statistics templates is not compatible with the assumption that our true templates are smoothly falling functions. However, for templates containing greater than 20 effective background MC events per bin prior to feature injection (that is, those in the statistics range we conclude that it is safe to use GPR in), the measured bias is less than 18\% of the injected feature size.
\caption{Spurious signal means and widths for the three test functional-form distributions for a range of different template statistics, with a signal feature injection that is approximately 3 GeV wide. The template statistics are fixed at one million events, and the feature size is varied.}
869
856
\label{tab:SigSSvarinj}
870
-
\end{table}
871
-
\end{landscape}
857
+
\end{sidewaystable}
872
858
873
859
As a final check, we investigate the bias when a standard-model-signal like feature is injected (~1 GeV wide). As expected, we see that the bias is larger- narrow features are more smoothed by the Gaussian Process fit, but are still present in the templates.
\caption{Spurious signal means and widths for the three test functional-form distributions for a range of different template statistics, with a signal feature injection that is approximately 3 GeV wide and 1\% of the template integral.}
996
981
\label{tab:SigSS1S}
997
-
\end{table}
998
-
\end{landscape}
982
+
\end{sidewaystable}
999
983
1000
984
From these studies, we conclude that, in the presence of underlying features that we wish to preserve, the bias is dependent on both the size and shape of the expected feature- features are blunted somewhat by the GP, but are still present in the smoothed template; how much they are blunted depends on their shape and size, both absolute and relative to the template as a whole.
The inputs to all BDTs are kinematic variables for the various objects in an event. In order to avoid sculpting of the shapes used in the statistical analysis, any variable found to be linearly correlated with $m_{\gamma\gamma}$ in the signal or background training samples by 5\% or more is removed from the list of inputs. The list of all variables used as input to both the multiclassifier BDT and the binary BDTs is given in Table \ref{tab:design:trainingvariables}.
21
21
22
-
\begin{landscape}
23
22
24
-
\begin{table}[]
23
+
\begin{sidewaystable}[]
25
24
\begin{center} \footnotesize
26
-
\resizebox{1.25\textwidth}{!}{
25
+
\resizebox{\textwidth}{!}{
27
26
\begin{tabular}{|c|c|c|c|c|}
28
27
\hline
29
28
STXS regions & Multi-class BDT & STXS regions & Binary BDT \\ \hline
To train the multiclassifier BDT, all signal samples are merged ($ggF$, $VH$, $VBF$, $ttH$, $tH$). A weight is then applied to each event such that all processes have equal yields in the training sample (that is, so processes such as $tH$ with a small cross-section are not underrepresented). The output of the multiclassifier BDT is a 44-dimensional vector discriminant with an index $y_{i}$ for each truth bin; these indices are then converted into class probabilities $z_{i}$ using a softmax function: $z_{i} = e^{y_{i}}/{\Sigma_{j}e^{y_{j}}}$. The BDT is trained by minimizing the cross-entropy of the softmax $z_{i}$ using the LightGBM package \cite{LightGBM}.
\caption{The correspondence between analysis category and STXS truth bins, in terms of the percentage contribution of a given STXS truth bin (y-axis) to the Higgs signal yield in a given analysis category (x-axis) for \ggtoH\ categories and truth bins. Entries with a value below $1\%$ are omitted.}
\caption{The correspondence between analysis category and STXS truth bins, in terms of the percentage contribution of a given STXS truth bin (y-axis) to the Higgs signal yield in a given analysis category (x-axis) for \qqtoHqq\ categories and truth bins. Entries with a value below $1\%$ are omitted.}
\caption{The correspondence between analysis category and STXS truth bins, in terms of the percentage contribution of a given STXS truth bin (y-axis) to the Higgs signal yield in a given analysis category (x-axis) for \qqtoHll\ and \qqtoHln\ categories and truth bins. Entries with a value below $1\%$ are omitted.}
\caption{The correspondence between analysis category and STXS truth bins, in terms of the percentage contribution of a given STXS truth bin (y-axis) to the Higgs signal yield in a given analysis category (x-axis) for \ttH\,$tWH$, and $tHjb$ categories and truth bins. Entries with a value below $1\%$ are omitted.}
\caption{The correspondence between analysis category and STXS truth bins, in terms of the percentage contribution of a given STXS truth bin (y-axis) to the Higgs signal yield in a given analysis category (x-axis) for \qqtoHqq\ STXS truth bins and \ggtoH\ analysis categories. Entries with a value below $1\%$ are omitted.}
\caption{The correspondence between analysis category and STXS truth bins, in terms of the percentage contribution of a given STXS truth bin (y-axis) to the Higgs signal yield in a given analysis category (x-axis) for \ggtoH\ STXS truth bins and \qqtoHqq\ analysis categories. Entries with a value below $1\%$ are omitted.}
288
284
\label{fig:yields_6}
289
285
\end{figure}
@@ -292,7 +288,7 @@ \section{Signal and Background Modelling} \label{sec:SignalBackground}
292
288
293
289
As in the CP analysis, a profile likelihood ratio fit is conducted simultaneously in all categories and a signal strength parameter is extracted.
294
290
295
-
Signal in each category is modelled using a Double-Sided Crystal Ball function, fit to Higgs-signal Monte Carlo. The Higgs mass is fixed to the run-1 measured value of $125.09$ GeV $\pm0.21$GeV(stat) $\pm0.1$GeV(syst) \cite{Higgsmass}.
291
+
Signal in each category is modelled using a Double-Sided Crystal Ball function, fit to Higgs-signal Monte Carlo. The Higgs mass is fixed to the run-1 measured value of $125.09$ GeV $\pm0.21$GeV(stat) $\pm0.1$GeV(syst) \cite{Higgsmass}.
296
292
297
293
Similarly, background is modelled using the spurious signal test. As detailed in Chapter \ref{chap:sigbkgparam}, in the $ggH$ and $qq \rightarrow Hqq'$ categories, the templates for the spurious signal study are conducted from Sherpa diphoton samples reweighted to model the proportional contributions of $\gamma\gamma$, $\gamma j$ and $jj$ events consisting of both true and fake photons in each category. In the leptonic $VH$ and $ttH+tH$ regions, however, the $\gamma j$ and $jj$ contributions are small enough to be neglected, so $V\gamma\gamma$ and $tt\gamma\gamma$ Monte Carlo respectively are used for the templates. In the low-stat categories, a Wald test is used to select the functional form. The spurious signal values and the choice of function are given in Tables \ref{tab:spurious_sig} and \ref{tab:spurious_sig2} .
0 commit comments