Skip to content

Commit ff36094

Browse files
committed
finish dedication
1 parent c3c6a64 commit ff36094

17 files changed

+2168
-2272
lines changed

appendix/TMVABDTStudies.tex

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -91,7 +91,7 @@ \subsubsection{Leptonic channel}
9191
\item The second-smallest $\Delta$R between a photon and a jet in the event
9292
\end{itemize}
9393

94-
The linear correlations between these variables in $ttH$ CP even and CP odd MadGraph5\_aMC@NLO+Pythia8 Monte Carlo are shown in Figure \ref{fig:lepcorr4vec}. Figures \ref{fig:lep4vecvbls1} - \ref{fig:lep4vecvbls6} compare the distribution of each training variable in $ttH$ CP even and CP odd Monte Carlo.
94+
The linear correlations between these variables in $ttH$ CP even and CP odd MadGraph5 aMCNLO+Pythia8 Monte Carlo are shown in Figure \ref{fig:lepcorr4vec}. Figures \ref{fig:lep4vecvbls1} - \ref{fig:lep4vecvbls6} compare the distribution of each training variable in $ttH$ CP even and CP odd Monte Carlo.
9595

9696
\begin{figure}[htbp]
9797
\centering

appendix/couplings_auxplots.tex

Lines changed: 3 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -12,14 +12,12 @@
1212
% \end{figure}
1313
%\end{center}
1414

15-
\begin{landscape}
16-
\begin{figure}[h]
15+
\begin{sidewaysfigure}[h]
1716
\centering
18-
\includegraphics[width=1.5\textwidth]{figures/couplings_chapter/purity_2D.pdf}
17+
\includegraphics[width=\textwidth]{figures/couplings_chapter/purity_2D.pdf}
1918
\caption{Contribution of STXS truth bins to each analysis category in total event yield. The top row corresponds to the value of $S_{90}/(S_{90} + B_{90})$ in each category, where $S_{90}$ and $B_{90}$ are respectively the total number of signal (including all STXS regions) and background events expected in the smallest $m_{\gamma \gamma}$ range containing 90\% of the signal yield. Other entries correspond to the percentage contribution of a given STXS truth bin to the Higgs signal yield in each analysis category. Entries for the STXS regions targeted by each analysis category are outlined in black if this value is above 15\%. }
2019
\label{fig:design:yields}
21-
\end{figure}
22-
\end{landscape}
20+
\end{sidewaysfigure}
2321

2422
\begin{figure}[htbp]
2523
\centering

appendix/gpr_templates.tex

Lines changed: 10 additions & 10 deletions
Original file line numberDiff line numberDiff line change
@@ -807,6 +807,16 @@ \subsection{Spurious Signal GPR-smoothed templates}
807807
GG2H\_PTH\_300\_450\_\_2 & Pow &-3.49&0.851\\
808808
GG2H\_PTH\_450\_650\_\_0 & Exp* &-0.67&N/A\\
809809
GG2H\_PTH\_450\_650\_\_1 & Exp* &-0.96&0.0262\\
810+
GG2H\_PTH\_GT650\_\_0 & Exp* &0.63&N/A\\
811+
GG2H\_PTH\_GT650\_\_1 & Exp* &-0.36&N/A\\
812+
QQ2HQQ\_0J\_\_0 & Exp* &-0.68&N/A\\
813+
QQ2HQQ\_0J\_\_1 & Exp* &-0.33&-0.204\\
814+
QQ2HQQ\_1J\_\_0 & Exp* &-0.53&N/A\\
815+
QQ2HQQ\_1J\_\_1 & Exp* &0.44&0.247\\
816+
QQ2HQQ\_1J\_\_2 & Pow &-1.35&-0.67\\
817+
QQ2HQQ\_GE2J\_MJJ\_0\_60\_\_0 & Exp* &0.64&N/A\\
818+
QQ2HQQ\_GE2J\_MJJ\_0\_60\_\_1 & Exp* &-0.39&-0.0541\\
819+
QQ2HQQ\_GE2J\_MJJ\_0\_60\_\_2 & Exp &-1.51&0.221\\
810820
\hline\hline
811821
\end{tabular}
812822
}
@@ -825,16 +835,6 @@ \subsection{Spurious Signal GPR-smoothed templates}
825835
& Function & \multicolumn{2}{c}{$max(S)$} \\
826836
Event category & & Nominal & Smooth temp \\
827837
\hline\hline
828-
GG2H\_PTH\_GT650\_\_0 & Exp* &0.63&N/A\\
829-
GG2H\_PTH\_GT650\_\_1 & Exp* &-0.36&N/A\\
830-
QQ2HQQ\_0J\_\_0 & Exp* &-0.68&N/A\\
831-
QQ2HQQ\_0J\_\_1 & Exp* &-0.33&-0.204\\
832-
QQ2HQQ\_1J\_\_0 & Exp* &-0.53&N/A\\
833-
QQ2HQQ\_1J\_\_1 & Exp* &0.44&0.247\\
834-
QQ2HQQ\_1J\_\_2 & Pow &-1.35&-0.67\\
835-
QQ2HQQ\_GE2J\_MJJ\_0\_60\_\_0 & Exp* &0.64&N/A\\
836-
QQ2HQQ\_GE2J\_MJJ\_0\_60\_\_1 & Exp* &-0.39&-0.0541\\
837-
QQ2HQQ\_GE2J\_MJJ\_0\_60\_\_2 & Exp &-1.51&0.221\\
838838
QQ2HQQ\_GE2J\_MJJ\_60\_120\_\_0 & Exp* &0.66&0.0616\\
839839
QQ2HQQ\_GE2J\_MJJ\_60\_120\_\_1 & Pow &-2.35&0.216\\
840840
QQ2HQQ\_GE2J\_MJJ\_120\_350\_\_0 & Exp* &-0.6&N/A\\

appendix/gpr_validation.tex

Lines changed: 16 additions & 32 deletions
Original file line numberDiff line numberDiff line change
@@ -105,8 +105,7 @@ \subsection{Nominal Bias Study}
105105

106106
We report the results of the nominal bias study for all categories in the \Tab{\ref{tab:NoSigSS}}.
107107

108-
\begin{landscape}
109-
\begin{table}
108+
\begin{sidewaystable}
110109
\centering
111110
\resizebox{\linewidth}{!}{
112111
\begin{tabular}{lcSS
@@ -153,8 +152,7 @@ \subsection{Nominal Bias Study}
153152
}
154153
\caption{Spurious signal means and widths for the three test functional-form distributions for a range of different template statistics.}
155154
\label{tab:NoSigSS}
156-
\end{table}
157-
\end{landscape}
155+
\end{sidewaystable}
158156

159157
To determine how to reduce the bias further, we note a further set of tests performed in an earlier iteration of this method \cite{Hyneman}, evaluating the difference in GP fit bias when different functional priors were used as the GP mean. Templates were constructed for several statistics regimes using power law (Fig.~\ref{fig:prior_bias_powerlaw}), ExpPoly2 (Fig.~\ref{fig:prior_bias_exppoly2}), and Bernstein 5 (Fig.~\ref{fig:prior_bias_bern5}) functions as the template basis; the possible choices of GP mean tested for each template were an exponential function, a linear function, and a flat line. In the tested templates with more than 10 effective MC events per bin, the choice of GP mean does not seem to affect the GP fit behavior significantly, though some fitting bias is observed in the lower-statistics templates. The unit of the y-axis in these plots is the percentage disagreement between the smoothed and the unsmoothed template, similar to a ratio plot.
160158

@@ -270,8 +268,7 @@ \subsection{Extended Templates}
270268

271269
We report the results of the padded-template bias study for all categories in the \Tab{\ref{tab:NoSigSSpadded}}.
272270

273-
\begin{landscape}
274-
\begin{table}
271+
\begin{sidewaystable}
275272
\centering
276273
\resizebox{\linewidth}{!}{
277274
\begin{tabular}{lcS
@@ -316,8 +313,7 @@ \subsection{Extended Templates}
316313
}
317314
\caption{Spurious signal means and widths for the three test functional-form distributions for a range of different template statistics.}
318315
\label{tab:NoSigSSpadded}
319-
\end{table}
320-
\end{landscape}
316+
\end{sidewaystable}
321317

322318
\subsection{Extended Templates, Linear Error Kernel}
323319

@@ -406,8 +402,7 @@ \subsection{Extended Templates, Linear Error Kernel}
406402

407403
We report the results of the padded-template, linear-error kernel bias study for all categories in the \Tab{\ref{tab:NoSigSSlinear}}.
408404

409-
\begin{landscape}
410-
\begin{table}
405+
\begin{sidewaystable}
411406
\centering
412407
\resizebox{\linewidth}{!}{
413408
\begin{tabular}{lcSS
@@ -454,17 +449,15 @@ \subsection{Extended Templates, Linear Error Kernel}
454449
}
455450
\caption{Spurious signal means and widths for the three test functional-form distributions for a range of different template statistics.}
456451
\label{tab:NoSigSSlinear}
457-
\end{table}
458-
\end{landscape}
452+
\end{sidewaystable}
459453

460454
To further validate the choice of 20 effective events per bin as the cutoff, we investigate some edge cases. Since the templates have 130 bins (due to the 10 bin padding on either side), we note that the 1000-event templates have just over 7 effective events per bin, while the 10000 event templates have just over 76 events per bin. We test templates with exactly 10 effective events per bin (1300 total events), slightly more than 10 effective events per bin (1400 total events), exactly 20 effective events per bin (2600 total events), and slightly more than 21 effective events per bin (2800 total events).
461455

462456
We note that, in the 10 event/bin regime, for templates generated with ExpPoly2 and ExpPoly3, we see no bias when fitting with ExpPoly2 and ExpPoly3, but see a bias of roughly 35\% of the statistical uncertainty on the spurious signal when fitting with lower degree-of-freedom templates (i.e., Exponential and Powerlaw). Upon closer examination, we observe that this is due to the presence of more substantial edge effects in the low-mass region of this very low statistics category that cannot be appropriately modelled by the Gaussian Process.
463457

464458
However, by requiring at least 20 effective events per bin, we observe that this bias is reduced to less than or equal to 20\% of the statistical uncertainty on the spurious signal. At the low-statistics end of this range, however, we note that the statistical uncertainty is expected to dominate (that is, spurious signal will not be a significant uncertainty), so we can conclude that the effects of the GPR bias will be minimal. We further note that, as statistics increase past 75 effective events per bin, bias drops off to less than 10\% of the spurious signal uncertainty- in regimes where the spurious signal uncertainty is expected to dominate, the bias is found to be negligible.
465459

466-
\begin{landscape}
467-
\begin{table}
460+
\begin{sidewaystable}
468461
\centering
469462
\resizebox{\linewidth}{!}{
470463
\begin{tabular}{lcSS
@@ -506,12 +499,10 @@ \subsection{Extended Templates, Linear Error Kernel}
506499
}
507500
\caption{Spurious signal means and widths for all choices of fit functional-form, using the "low" template with the ExpPoly2 generating functional form, for a range of different template statistics.}
508501
\label{tab:NoSigSSedges1}
509-
\end{table}
510-
\end{landscape}
502+
\end{sidewaystable}
511503

512504

513-
\begin{landscape}
514-
\begin{table}
505+
\begin{sidewaystable}
515506
\centering
516507
\resizebox{\linewidth}{!}{
517508
\begin{tabular}{lcSS
@@ -559,8 +550,7 @@ \subsection{Extended Templates, Linear Error Kernel}
559550
}
560551
\caption{Spurious signal means and widths for all choices of fit functional-form, using the "medium" template with the ExpPoly3 generating functional form and the "high" template with the ExpPoly3 generating functional form, for a range of different template statistics.}
561552
\label{tab:NoSigSSedges2}
562-
\end{table}
563-
\end{landscape}
553+
\end{sidewaystable}
564554

565555

566556
\begin{figure}
@@ -734,8 +724,7 @@ \subsection{Feature Injection Study}
734724
\end{center}
735725
\end{figure}
736726

737-
\begin{landscape}
738-
\begin{table}
727+
\begin{sidewaystable}
739728
\centering
740729
\resizebox{\linewidth}{!}{
741730
\begin{tabular}{lcSS
@@ -779,8 +768,7 @@ \subsection{Feature Injection Study}
779768
}
780769
\caption{Spurious signal means and widths for the three test functional-form distributions for a range of different template statistics, with a signal feature injection that is approximately 3 GeV wide and 1\% of the template integral.}
781770
\label{tab:SigSS}
782-
\end{table}
783-
\end{landscape}
771+
\end{sidewaystable}
784772

785773
The feature-injection bias for a three-sigma feature does not change appreciably with template shape (that is, for a given statistics level, the bias is approximately the same for all three templates). However, at high stats, the bias / feature size drops off as a function of template statistics. This makes sense, as the presence of true underlying features in high statistics templates is not compatible with the assumption that our true templates are smoothly falling functions. However, for templates containing greater than 20 effective background MC events per bin prior to feature injection (that is, those in the statistics range we conclude that it is safe to use GPR in), the measured bias is less than 18\% of the injected feature size.
786774

@@ -830,8 +818,7 @@ \subsection{Feature Injection Study}
830818
\end{center}
831819
\end{figure}
832820

833-
\begin{landscape}
834-
\begin{table}
821+
\begin{sidewaystable}
835822
\centering
836823
\resizebox{\linewidth}{!}{
837824
\begin{tabular}{lcSS
@@ -867,8 +854,7 @@ \subsection{Feature Injection Study}
867854
}
868855
\caption{Spurious signal means and widths for the three test functional-form distributions for a range of different template statistics, with a signal feature injection that is approximately 3 GeV wide. The template statistics are fixed at one million events, and the feature size is varied.}
869856
\label{tab:SigSSvarinj}
870-
\end{table}
871-
\end{landscape}
857+
\end{sidewaystable}
872858

873859
As a final check, we investigate the bias when a standard-model-signal like feature is injected (~1 GeV wide). As expected, we see that the bias is larger- narrow features are more smoothed by the Gaussian Process fit, but are still present in the templates.
874860

@@ -949,8 +935,7 @@ \subsection{Feature Injection Study}
949935
\end{center}
950936
\end{figure}
951937

952-
\begin{landscape}
953-
\begin{table}
938+
\begin{sidewaystable}
954939
\centering
955940
\resizebox{\linewidth}{!}{
956941
\begin{tabular}{lcSS
@@ -994,8 +979,7 @@ \subsection{Feature Injection Study}
994979
}
995980
\caption{Spurious signal means and widths for the three test functional-form distributions for a range of different template statistics, with a signal feature injection that is approximately 3 GeV wide and 1\% of the template integral.}
996981
\label{tab:SigSS1S}
997-
\end{table}
998-
\end{landscape}
982+
\end{sidewaystable}
999983

1000984
From these studies, we conclude that, in the presence of underlying features that we wish to preserve, the bias is dependent on both the size and shape of the expected feature- features are blunted somewhat by the GP, but are still present in the smoothed template; how much they are blunted depends on their shape and size, both absolute and relative to the template as a whole.
1001985

sections/couplings_chapter.tex

Lines changed: 12 additions & 16 deletions
Original file line numberDiff line numberDiff line change
@@ -19,11 +19,10 @@ \section{Categorization} \label{sec:Categorization}
1919

2020
The inputs to all BDTs are kinematic variables for the various objects in an event. In order to avoid sculpting of the shapes used in the statistical analysis, any variable found to be linearly correlated with $m_{\gamma \gamma}$ in the signal or background training samples by 5\% or more is removed from the list of inputs. The list of all variables used as input to both the multiclassifier BDT and the binary BDTs is given in Table \ref{tab:design:trainingvariables}.
2121

22-
\begin{landscape}
2322

24-
\begin{table}[]
23+
\begin{sidewaystable}[]
2524
\begin{center} \footnotesize
26-
\resizebox{1.25\textwidth}{!}{
25+
\resizebox{\textwidth}{!}{
2726
\begin{tabular}{|c|c|c|c|c|}
2827
\hline
2928
STXS regions & Multi-class BDT & STXS regions & Binary BDT \\ \hline
@@ -91,8 +90,7 @@ \section{Categorization} \label{sec:Categorization}
9190
\label{tab:design:trainingvariables}
9291

9392
\end{center}
94-
\end{table}
95-
\end{landscape}
93+
\end{sidewaystable}
9694

9795
To train the multiclassifier BDT, all signal samples are merged ($ggF$, $VH$, $VBF$, $ttH$, $tH$). A weight is then applied to each event such that all processes have equal yields in the training sample (that is, so processes such as $tH$ with a small cross-section are not underrepresented). The output of the multiclassifier BDT is a 44-dimensional vector discriminant with an index $y_{i}$ for each truth bin; these indices are then converted into class probabilities $z_{i}$ using a softmax function: $z_{i} = e^{y_{i}}/{\Sigma_{j}e^{y_{j}}}$. The BDT is trained by minimizing the cross-entropy of the softmax $z_{i}$ using the LightGBM package \cite{LightGBM}.
9896

@@ -243,47 +241,45 @@ \section{Categorization} \label{sec:Categorization}
243241

244242
\begin{figure}[h]
245243
\centering
246-
\includegraphics[width=1.09\textwidth]{figures/couplings_chapter/purity_2D_subplots_1}
244+
\includegraphics[width=\textwidth]{figures/couplings_chapter/purity_2D_subplots_1}
247245
\caption{The correspondence between analysis category and STXS truth bins, in terms of the percentage contribution of a given STXS truth bin (y-axis) to the Higgs signal yield in a given analysis category (x-axis) for \ggtoH\ categories and truth bins. Entries with a value below $1\%$ are omitted.}
248246
\label{fig:yields_1}
249247
\end{figure}
250248

251249

252250
\begin{figure}[h]
253251
\centering
254-
\includegraphics[width=1.09\textwidth]{figures/couplings_chapter/purity_2D_subplots_2}
252+
\includegraphics[width=\textwidth]{figures/couplings_chapter/purity_2D_subplots_2}
255253
\caption{The correspondence between analysis category and STXS truth bins, in terms of the percentage contribution of a given STXS truth bin (y-axis) to the Higgs signal yield in a given analysis category (x-axis) for \qqtoHqq\ categories and truth bins. Entries with a value below $1\%$ are omitted.}
256254
\label{fig:yields_2}
257255
\end{figure}
258256

259257

260258
\begin{figure}[h]
261259
\centering
262-
\includegraphics[width=1.09\textwidth]{figures/couplings_chapter/purity_2D_subplots_3}
260+
\includegraphics[width=\textwidth]{figures/couplings_chapter/purity_2D_subplots_3}
263261
\caption{The correspondence between analysis category and STXS truth bins, in terms of the percentage contribution of a given STXS truth bin (y-axis) to the Higgs signal yield in a given analysis category (x-axis) for \qqtoHll\ and \qqtoHln\ categories and truth bins. Entries with a value below $1\%$ are omitted.}
264262
\label{fig:yields_3}
265263
\end{figure}
266264

267265

268266
\begin{figure}[h]
269267
\centering
270-
\includegraphics[width=1.09\textwidth]{figures/couplings_chapter/purity_2D_subplots_4}
268+
\includegraphics[width=\textwidth]{figures/couplings_chapter/purity_2D_subplots_4}
271269
\caption{The correspondence between analysis category and STXS truth bins, in terms of the percentage contribution of a given STXS truth bin (y-axis) to the Higgs signal yield in a given analysis category (x-axis) for \ttH\, $tWH$, and $tHjb$ categories and truth bins. Entries with a value below $1\%$ are omitted.}
272270
\label{fig:yields_4}
273271
\end{figure}
274272

275-
\begin{landscape}
276-
\begin{figure}[h]
273+
\begin{sidewaysfigure}[h]
277274
\centering
278-
\includegraphics[width=1.5\textwidth]{figures/couplings_chapter/purity_2D_subplots_5}
275+
\includegraphics[width=\textwidth]{figures/couplings_chapter/purity_2D_subplots_5}
279276
\caption{The correspondence between analysis category and STXS truth bins, in terms of the percentage contribution of a given STXS truth bin (y-axis) to the Higgs signal yield in a given analysis category (x-axis) for \qqtoHqq\ STXS truth bins and \ggtoH\ analysis categories. Entries with a value below $1\%$ are omitted.}
280277
\label{fig:yields_5}
281-
\end{figure}
282-
\end{landscape}
278+
\end{sidewaysfigure}
283279

284280
\begin{figure}[h]
285281
\centering
286-
\includegraphics[width=1.09\textwidth]{figures/couplings_chapter/purity_2D_subplots_6}
282+
\includegraphics[width=\textwidth]{figures/couplings_chapter/purity_2D_subplots_6}
287283
\caption{The correspondence between analysis category and STXS truth bins, in terms of the percentage contribution of a given STXS truth bin (y-axis) to the Higgs signal yield in a given analysis category (x-axis) for \ggtoH\ STXS truth bins and \qqtoHqq\ analysis categories. Entries with a value below $1\%$ are omitted.}
288284
\label{fig:yields_6}
289285
\end{figure}
@@ -292,7 +288,7 @@ \section{Signal and Background Modelling} \label{sec:SignalBackground}
292288

293289
As in the CP analysis, a profile likelihood ratio fit is conducted simultaneously in all categories and a signal strength parameter is extracted.
294290

295-
Signal in each category is modelled using a Double-Sided Crystal Ball function, fit to Higgs-signal Monte Carlo. The Higgs mass is fixed to the run-1 measured value of $125.09$ GeV $\pm 0.21 $GeV(stat) $\pm 0.1 $GeV(syst) \cite{Higgsmass}.
291+
Signal in each category is modelled using a Double-Sided Crystal Ball function, fit to Higgs-signal Monte Carlo. The Higgs mass is fixed to the run-1 measured value of $125.09$ GeV $\pm 0.21 $GeV (stat) $\pm 0.1 $GeV (syst) \cite{Higgsmass}.
296292

297293
Similarly, background is modelled using the spurious signal test. As detailed in Chapter \ref{chap:sigbkgparam}, in the $ggH$ and $qq \rightarrow Hqq'$ categories, the templates for the spurious signal study are conducted from Sherpa diphoton samples reweighted to model the proportional contributions of $\gamma \gamma$, $\gamma j$ and $jj$ events consisting of both true and fake photons in each category. In the leptonic $VH$ and $ttH+tH$ regions, however, the $\gamma j$ and $jj$ contributions are small enough to be neglected, so $V\gamma\gamma$ and $tt\gamma\gamma$ Monte Carlo respectively are used for the templates. In the low-stat categories, a Wald test is used to select the functional form. The spurious signal values and the choice of function are given in Tables \ref{tab:spurious_sig} and \ref{tab:spurious_sig2} .
298294

0 commit comments

Comments
 (0)