PhD-Kopecna-Renata/Chapters/Toys/toys.tex

\section{Pseudoexperiments}\label{sec:toys}

In order to validate the \fcncfitter framework and its settings, dedicated tests on pseudoexperiments are performed. These tests are designed to verify the estimation of the fitted values as well as their associated statistical uncertainties. The tests are necessary as the \fcncfitter framework is a rather complex tool. Moreover, the limited statistical power of the sample calls for many constraints on the fitted parameters. These constrains also have to be thoroughly validated.

Therefore, dedicated sets of \emph{pseudoexperiments}  are created. \emph{Pseudoexperiment} is a randomly-generated set of \emph{pseudoevents}. The pseudoevents are events generated according to a simplified model. Such simplified model, or a \emph{toy} model, allows to study specific cases of the framework configuration as well as the influence of specific parameter values on the final fit result. Moreover, these studies can be done on arbitrarily large samples, minimizing the influence of limited statistical power.

The goal of the pseudoexperiment studies is to validate and correct the functionality of the \fcncfitter framework and therefore to obtain a bias-free result with good coverage of statistical uncertainty. The focus of these studies is on the angular \pwave parameters, however, the coverage of statistical uncertainty and potential biases are also studied for all other free parameters. The events are generated following the distributions of \angleDist and \qsq without the angular acceptance weights applied. In order to study the influence of the acceptance weights, the acceptance weights are applied during the fit.

The pseudoexperiments are validated by studying the \emph{pull distributions} of the parameters that are free in the fit. Such distributions represent the difference between the measured value, $x$, and the generated value, $x_0$, divided by the uncertainty of the measurement $\sigma$,
%
\begin{equation}\label{eq:toy-pull}
	p(x, \sigma) = \frac{x-x_0}{\sigma}\,.
\end{equation}
%
Using the central limit theorem, the shape of such distribution for any statistically independent random variable follows a Gaussian distribution. The width of such distribution is equal to one, the mean is equal to zero. Any shift from the mean value of zero indicates a bias of the measured value in terms of standard deviations. If the width is larger than one, it suggest the undercoverage of the uncertainty, width smaller than one signals overcoverage of the uncertainty. As an example, a width of 0.25 means the actual statistical uncertainty should be four times as large as the measured one. There are also many reasons why the pull distribution can not be described by a Gaussian distribution, such as that the free parameter is at its limit or there is a technical problem with the minimization.


\subsection{Generation of pseudoexperiments}\label{sec:toys-gen}

For this analysis, the fitted variables \mBu, \mKstarp, \angleDist are directly generated using a random number generator. For the generation, the \root class \texttt{TRandom3} is used. This class is used to generate equi-distributed pseudorandom floating-point numbers in the interval $(0,1\rangle$ via the Mersenne Twister algorithm~\cite{FIT-MersenneTwister}. This algorithm has been devoloped in the late 90's and it is widely used as it is relatively fast while passing most statistical tests designed to measure the quality of a random number generator\footnote{Among standardized tests are \eg DieHard~\cite{FIT-DieHard} or the U01 tests~\cite{FIT-testU01}. The Mersenne Twister algorithm passes all of DieHard tests and vast majority of the U01 tests.}.


The \texttt{TRandom3} class provides generation of uniform distributions. However, the desired distributions are non-uniform. There are several methods how to convert the uniformly distributed random numbers to produce non-uniform distributions: simplest of them being a simple transformation of the distribution. However, this is possible only for distributions with an analytical integral. Unfortunately, this is rarely the case. A simple method used to generate a non-uniform distribution, \emph{rejection method}\,\cite{FIT-NR}, is used in this analysis. The method is very similar to numerical computation of integrals.


\subsubsection{Rejection method}\label{sec:toys-genMethod}
%
\begin{wrapfigure}[12]{r}{0.58\textwidth}
	\centering\vspace{-62pt}
	\includegraphics[width=0.61\textwidth, angle=0]{FCNC/Toys/rejection.jpg}
	\captionof{figure}[Rejection method illustration.]{Rejection method illustration.
		The desired generated distribution is denoted $p(x)$, the \emph{comparison function} $f(x)$. The comparison function is always more than the function $p(x)$. The second random deviate of any $x_0$ is then used to decide whether to accept or reject the point at $x_0$. If rejected, new second random deviate of $f$ is found. Taken from~Ref.\,\cite{FIT-NR}.} \label{fig:FIT-reject}
\end{wrapfigure}


The goal is to generate a sample of random numbers with a non-uniform distribution function $p(x)$, where $x$
is defined and non-zero in a certain range $(A,B\rangle$. First step is to select a \emph{comparison function} $f(x)$. The comparison function has to be larger  $p(x)$ for all $x\in(A,B\rangle$:\vspace{-0.25\baselineskip}
%
\begin{equation}
	f(x) > p(x)\,\quad \text{for~} \forall x\in(A,B\rangle\,.\vspace{0.25\baselineskip}
\end{equation}
%
Then, the area below the comparison function $f(x)$ is populated uniformly with random points denoted $[u_x,u_y]$. For each point, the value $f(u_x)$ and $p(u_x)$ is calculated. If
$ u_y > \sfrac{p(u_x)}{f(u_x)}$ the point $[u_x,u_y]$ is \emph{rejected}. New point is generated until the condition is satisfied. The ratio of $\sfrac{\text{rejected}}{\text{accepted}}$ points is then equal to the ratio of the area between $f(x)$ and $p(x)$ to the area under $p(x)$. Hence, the accepted points follow the distribution $p(x)$. An illustration of this procedure is shown in \refFig{FIT-reject}.

The main advantage of this method is its variability and simplicity. The distribution function $p(x)$ always has to have a maximum, as the integration over its domain has to be equal to one, is positive and is continuous. Therefore, it is always possible to construct the comparison function as a 'rectangle' above the desired distribution $f(x) = \max_{x\in(A,B\rangle} \{ p(x) \}$. On the other hand, this leads to the main disadvantage of the method: when the area below the comparison function $f(x)$ is much larger than the area below $p(x)$, the count of rejected points will be very large, leading to a long computing time. Good comparison function is crucial for an effective generation of non-uniform distributions.


\subsection{Validation of the generation of the pseudoexperiments}\label{sec:toys-valid}

The \fcncfitter framework also provides the tools to generate the pseudoexperiments, exploiting the rejection method. In order to validate this functionality of the framework, pseudoexperiments with \BuToKstJpsi decays are generated.

The first test is to generate only events with the signal component. The events are generated following the distributions explained in \refSec{parMeas-sig}. The mass parameters and the parameter $F_S$ are taken from the fit to the reference decay \BuToKstJpsi, the angular components are generated with values based on the study of the \BuToKstJpsi decay done for~Ref.\,\cite{ANA-LHCb-angular4}.
%
\begin{figure}[hbt!]
	\centering
	\includegraphics[width=0.32\textwidth]{FCNC/Toys/Ref/6/ctk_toyfit__6__JpsiFit_OnlySignal_1BIN_bin0_Run12_SimultaneousFit_HighBmass_AllPDFs.eps}
	\includegraphics[width=0.32\textwidth]{FCNC/Toys/Ref/6/ctl_toyfit__6__JpsiFit_OnlySignal_1BIN_bin0_Run12_SimultaneousFit_HighBmass_AllPDFs.eps}
	\includegraphics[width=0.32\textwidth]{FCNC/Toys/Ref/6/phi_toyfit__6__JpsiFit_OnlySignal_1BIN_bin0_Run12_SimultaneousFit_HighBmass_AllPDFs.eps}\\
	\includegraphics[width=0.32\textwidth]{FCNC/Toys/Ref/6/m_toyfit__6__JpsiFit_OnlySignal_1BIN_bin0_Run12_SimultaneousFit_HighBmass_AllPDFs.eps}
	\includegraphics[width=0.32\textwidth]{FCNC/Toys/Ref/6/mkpi_toyfit__6__JpsiFit_OnlySignal_1BIN_bin0_Run12_SimultaneousFit_HighBmass_AllPDFs.eps}
	\captionof{figure}[Fit to a signal-only pseudoexperiment.]{Fit to 52\,000 pseudoevents with only the signal component. The projections in \angleDist, \Bu meson mass, and \Kstarp meson mass are shown. The black points represent the generated pseudoevents, the black line is the fitted model. The blue space represents the signal component. The green dashed line shows only the \pwave component, the orange dotted line represents the \swave components and the pink dot-and-dash line depicts the interference between the \pwave and the \swave. }\label{fig:toy-sigOnly-Ref}
\end{figure}
%
According to the mass fit to the reference channel, there are about 52\,000 signal candidates in the data sample. Therefore, for this validation 52\,000 pseudoevents are generated. The fit to this pseudoexperiment is given in  \refFig{toy-sigOnly-Ref}. The \fcncfitter generates the desired distributions and also successfully fits them.

Next step is the validation of the folding technique. As the \fcncfitter was initially designed without the cut on \ctk in mind, this is a crucial check, especially for folding four (see \refEq{foldings}). The fit to the pseudoevents with the folding applied is shownin \refFig{toy-sigOnly-Ref-fold} in the appendix \ref{app:toy-valid}. The generated pseudoevents and the fit to the pseudoevents fulfill the expectations and agree with each other.

Similarly, a pseudoexperiment with only the background component is performed. The pseudoevents are generated according to the distributions given in \refSec{parMeas-bkg} and are shown with their corresponding fit in \refFig{toy-bkgOnly-Ref}. Following the mass fit of the \BuToKstJpsi decay, the expected background yield is 13\,000 thousand events. Hence, 13\,000 events are generated in each pseudoexperiment. The parameters used in the generation are taken from the fit of the background data sample described in  \refSec{parMeas-bkg}. In the case of the background component, the validation of the folding technique is even more important than in the case of signal only due to the complicated shape of the \ctk background not previously implemented in the \fcncfitter. As shown  in \refFig{toy-bkgOnly-Ref-fold} in the appendix, the  background is successfully generated, folded and fitted. Once again, 500 pseuodoexperiments are created and the pull distributions of the free parameters are investigated. The pull distributions are normalized and centralized at zero.

\begin{figure}[hbt!]
	\centering
	\includegraphics[width=0.32\textwidth]{FCNC/Toys/Bkg/7/ctk_toyfit__7__JpsiFit_OnlyBackground_1BIN_bin0_Run12_SimultaneousFit_HighBmass_AllPDFs.eps}
	\includegraphics[width=0.32\textwidth]{FCNC/Toys/Bkg/7/ctl_toyfit__7__JpsiFit_OnlyBackground_1BIN_bin0_Run12_SimultaneousFit_HighBmass_AllPDFs.eps}
	\includegraphics[width=0.32\textwidth]{FCNC/Toys/Bkg/7/phi_toyfit__7__JpsiFit_OnlyBackground_1BIN_bin0_Run12_SimultaneousFit_HighBmass_AllPDFs.eps}\\
	\includegraphics[width=0.32\textwidth]{FCNC/Toys/Bkg/7/m_toyfit__7__JpsiFit_OnlyBackground_1BIN_bin0_Run12_SimultaneousFit_HighBmass_AllPDFs.eps}
	\includegraphics[width=0.32\textwidth]{FCNC/Toys/Bkg/7/mkpi_toyfit__7__JpsiFit_OnlyBackground_1BIN_bin0_Run12_SimultaneousFit_HighBmass_AllPDFs.eps}
	\captionof{figure}[Fit to a background-only pseudoexperiment.]{Fit to 13\,000 pseudoevents with only the background component. The black points represent the generated pseudoevents, the black line represents the fitted model. The red area represents the background component. }\label{fig:toy-bkgOnly-Ref}

\end{figure}


\subsection[Large scale pseudoexperiments]{Large scale pseudoexperiments}\label{sec:toy-ref}

Once the pseudoexperiment generation, folding and fitting is validated, the next step is to validate the fit technique using both the signal and the background components. To avoid potential biases of the minimization introduced by low statistical power of the sample, a pseudoexperiment with 65\,000 thousand events is generated. The fraction of signal events, the mass parameters and the $F_S$ parameters are generated with values obtained by the mass fit to the reference channel \BuToKstJpsi  described in \refSec{parMeas-FS}, the background parameters are generated with values obtained by the fit to the background sample, described in \refSec{parMeas-bkg}. The angular parameters are initialized based on the study of the \BuToKstJpsi decay done for~Ref.\,\cite{ANA-LHCb-angular4}. In order to study the pull distributions of these pseudoexperiments, five hundred pseudoexperiments are generated for each folding.

The pull distributions are shown in \refFig{toys-Ref-pull-P} and \refFig{toys-Ref-pull-S} and presented in \refTab{toys-Ref-pull}. There are no large biases present, the largest one is 6\% of the statistical uncertainty in the case of the $F_S$ parameter. However, it is clear that the errors are significantly overestimated: the width of the pull distribution is in the order of 20\% for all parameters. This is caused by only an approximate estimation of the uncertainty in the fitter. The standard statistical uncertainties from the maximum-likelihood fit are obtained using the \hesse determination, which derives a symmetric statistical uncertainty by inverting the second derivative of the likelihood function at the best-fit value~\cite{FIT-hesse}. For any real physical problem, the covariance matrix has to be positive-definite. However, in the presence of correlated free parameter(s), this matrix can fail the positive-definite requirement. In this case, \hesse forms a positive-definite approximation~\cite{FIT-TMINUIT}.  Due to modeling of the complicated structure of the \ctk background by the Chebyshev polynomial of order five, discussed \refSec{parMeas-bkg}, the parameters describing the \ctk background are highly correlated and therefore only an approximation of the statistical uncertainty is available.

Moreover, the presence of larger acceptance correction weights prevents the simplification of the full covariance matrix expression to the inverse Hessian. Hence \hesse uncertainty determination can no longer guarantee providing correct coverage.

This can be improved by using \minos~\cite{FIT-TMINUIT}, which takes into account the parameter correlations and non-linearities. \minos varies each parameter, minimizing the fit function with respect to the other parameters. This procedure however requires a good previous error estimation and is computationally very intense: running several hundred pseudoexperiments would require significant amount of CPU time. As the shape of the pull distribution using the \hesse approximation is Gaussian, the widths of the pull distributions can be used to correct the statistical uncertainty in the fit of the data. \vspace{\baselineskip}

\begin{table}[hbt!] \footnotesize \centering
	\begin{tabular}{|l|c|c|}\hline
		\textbf{parameter} & \textbf{mean} & \textbf{width}\\
		\hline\hline
		$F_{L}$          &  $-0.008 \pm 0.009$ 			 & $\phantom{-}0.116 \pm 0.006$\\
		$S_{3}$          &  $\phantom{-}0.013 \pm 0.018$ & $\phantom{-}0.234 \pm 0.013$\\
		$S_{4}$          &  $-0.004 \pm 0.014$ 			 & $\phantom{-}0.185 \pm 0.010$\\
		$S_{5}$          &  $\phantom{-}0.003 \pm 0.017$ & $\phantom{-}0.229 \pm 0.012$\\
		$A_{FB}$         &  $\phantom{-}0.008 \pm 0.014$ & $\phantom{-}0.178 \pm 0.010$\\
		$S_{7}$          &  $\phantom{-}0.010 \pm 0.017$ & $\phantom{-}0.221 \pm 0.012$\\
		$S_{8}$          &  $-0.009 \pm 0.016$ 			 & $\phantom{-}0.214 \pm 0.011$\\
		$S_{9}$          &  $\phantom{-}0.038 \pm 0.017$ & $\phantom{-}0.218 \pm 0.012$\\\hline
		$F_{S}$          &  $-0.060 \pm 0.006$ 			 & $\phantom{-}0.074 \pm 0.004$\\
		$S_{S1}$         &  $\phantom{-}0.053 \pm 0.009$ & $\phantom{-}0.121 \pm 0.007$\\
		$S_{S2}$         &  $-0.038 \pm 0.015$ 			 & $\phantom{-}0.204 \pm 0.011$\\
		$S_{S3}$         &  $\phantom{-}0.022 \pm 0.016$ & $\phantom{-}0.211 \pm 0.011$\\
		$S_{S4}$         &  $\phantom{-}0.021 \pm 0.017$ & $\phantom{-}0.220 \pm 0.012$\\
		$S_{S5}$         &  $-0.071 \pm 0.015$ 			 & $\phantom{-}0.199 \pm 0.011$\\
		\hline
	\end{tabular}
	\captionof{table}[The angular moments pull distribution properties in reference-like pseudoexperiments.]{The widths and the means of the pull distributions of the angular moments in reference-like pseudoexperiments. 500 pseudoexperiments have been generated, mimicking the refrence \BuToKstJpsi decay. Each pseudoexperiment consists of 65\,000 pseudoevents.} \label{tab:toys-Ref-pull}
\end{table}


Another five sets of pseudoexpriments are created in order to validate the pull distribution of fits exploiting the folding technique. The corresponding pull distributions are presented in \refApp{toys-ref}. The effect of the complicated background structure in \ctk can be nicely seen in \refFig{app-toys-ref-fld4}: the odd orders of the polynomials cancel out, leaving only orders of two and four. The correlations between the coefficient is therefore weaker and the width of the pull distribution is close to one.


% An example of such generated and fitted sample is in \refFig{toy-Ref}.
%
%\begin{figure}[hbt!]
%	\centering
%	\includegraphics[width=0.32\textwidth]{FCNC/Toys/Ref/5/ctk_toyfit__5__JpsiFit_1BIN_bin0_Run12_SimultaneousFit_HighBmass_AllPDFs.eps}
%	\includegraphics[width=0.32\textwidth]{FCNC/Toys/Ref/5/ctl_toyfit__5__JpsiFit_1BIN_bin0_Run12_SimultaneousFit_HighBmass_AllPDFs.eps}
%	\includegraphics[width=0.32\textwidth]{FCNC/Toys/Ref/5/phi_toyfit__5__JpsiFit_1BIN_bin0_Run12_SimultaneousFit_HighBmass_AllPDFs.eps}\\
%	\includegraphics[width=0.32\textwidth]{FCNC/Toys/Ref/5/m_log_toyfit__5__JpsiFit_1BIN_bin0_Run12_SimultaneousFit_HighBmass_AllPDFs.eps}
%	\includegraphics[width=0.32\textwidth]{FCNC/Toys/Ref/5/m_toyfit__5__JpsiFit_1BIN_bin0_Run12_SimultaneousFit_HighBmass_AllPDFs.eps}
%	\includegraphics[width=0.32\textwidth]{FCNC/Toys/Ref/5/mkpi_toyfit__5__JpsiFit_1BIN_bin0_Run12_SimultaneousFit_HighBmass_AllPDFs.eps}\\
%	\captionof{figure}[Fit to reference channel like pseudoexperiments.]{Fit to 65\,000 pseudoexperiments generated to mimic the \BuToKstJpsi decay.  The black points represent the generated pseudoexperiments, the black line represents the fitted model. The red area represents the background component, the blue space represents the signal component. The green dashed line shows only the \pwave component, the orange dotted line represents the \swave components and the pink dot-and-dash line depicts the interference between the \pwave and the \swave. }\label{fig:toy-Ref}
%
%\end{figure}
%%---------------
%\todo[inline]{Copy the toys plots to local folder!}
\input{Chapters/Toys/jobs/614}


\subsection[Realistic scale pseudoexperiments]{Realistic scale pseudoexperiments}\label{sec:toy-sig}

Lastly, the statistical properties of the fit are investigated by creating a set of 500 pseudoexperiments designed to resemble the \BuToKstmm decay.
The fraction of signal pseudoevents and the mass parameters are generated with values obtained by the mass fit to the \BuToKstmm channel described in \refSec{parMeas-FS}. The fraction of signal pseudoevents for each \qsq bin and each Run are listed in \refTab{toys-fsig}. The background parameters are generated with values obtained by the fit to the background sample, described in \refSec{parMeas-bkg}. The background is generated according to \refEq{bkg-angular}: the order of the Chebyshev polynomial in the \ctk dimension is five.
%
\begin{table}[hbt!] \footnotesize \centering
	\begin{tabular}{l|c c c c c}
		$f_{sig}$	&[0.25--4.00]	&[4.00--8.00]	&[11.00--12.50]	&[15.00--18.00]	&[1.10--6.00]\\ \hline
		\runI 	& 0.33 	& 0.35 & 0.51 & 0.53 & 0.25 \\
		\runII 	& 0.36	& 0.28 & 0.43 & 0.61 & 0.32\\

	\end{tabular}
	\captionof{table}[Fraction of signal pseudoevents in the pseudoexperiments.]{Fraction of signal pseudoevents in the pseudoexperiments for each of the \qsq bins and for each Run. The fraction is based on the mass fit to the data sample of the \BuToKstmm decay candidates.} \label{tab:toys-fsig}
\end{table}
%
The \pwave angular parameters are initialized to the Standard Model values obtained using the \flavio package~\cite{ANA-flavio}. The $F_S$ is initialized to be 0.25, based on the \emph{maximal} value of $F_S$ observed in~Ref.\,\cite{ANA-LHCb-angular4}. The interference angular parameters are initialized according to the values measured in~Ref.\,\cite{ANA-LHCb-angular4}. For each pseudoexperiment, 871 pseudoevents are generated. This is motivated by the number of selected signal candidates in \refTab{sel-selection_yields_rare}. In the fit, the free parameters are the \pwave angular parameters, the angular background parameters, the fraction of signal pseudoevents to all pseudoevents, and the exponential parameter describing the \Bu mass background.


Due to the low number of available candidates in the sample, it is not possible to use the complicated model of the \ctk background shown in \refSec{parMeas-bkg}. Hence, a dedicated test has been done to show that a Chebyshev polynomial of order of two is sufficient to describe the \ctk background shape. In this test, despite the generated distribution following the Chebyshev polynomial order of five, only the parameters up to the order of two are left floating, the rest is set to zero.  The pull distributions created from 500 pseudoevents are summarized in \refTab{toys-Sig-pull-643}. The simplified background description does not bring any significant bias to the floating angular parameters\footnote{This does not hold for samples with more statistical power: a dedicated test is done using pseudoexperiments with 10 times more pseudoevents than currently present in the data. The simplified background description using the Chebyshev polynomial in \ctk of order of two does not describe the background well enough anymore and the pulls of the angular parameters evince biases of up to 55\% of the standard statistical uncertainty.}. Furthermore, by lowering the order of the Chebyshev polynomial, the overestimation of the statistical uncertainty is reduced to $\sim90\%$.

\input{Chapters/Toys/jobs/pull_table_643_col}

The largest discrepancy of the mean of the pull distribution from zero is present for the parameter \FL, the bias reaches up to 24\% of the statistical uncertainty in the bin in $\qsq\in[11.0,12.5]\gevgev$. The folding technique is also tested and the results are shown in \refTab{toys-Sig-pull-644}. The bias in \FL can be reduced by using the folding technique, but it does not disappear. The parameter \FL is very sensitive to \ctk and therefore small bias caused by not properly describing the background shape is expected. Another more prominent bias in the full angular fit is present in the case of \AFB in $\qsq\in[4.0,8.0]\gevgev$. However, this bias disappears by using the folding 0, see \refTab{toys-Sig-pull-644}.

Also in the case of folded fit, the width of the pulls is $\approx0.9$. The only exception is $S_3$ and $S_8$ with folding four applied: in folding four, the \ctk is also folded. In this case, parabolic fit was not describing the folded background well enough, hence the fourth Chebyshev polynomial parameter is added. This leads to a correlation present in the background description, resulting in the smaller width of the pulls. Contrary to this, the width of the \FL pulls are up to 1.340. However, using this folding, the measurement of parameter \FL is also biased by up to 45\% of the statistical uncertainty. This indicates the large sensitivity of \FL on proper background description.

\input{Chapters/Toys/jobs/pull_table_644_fld_col}
%\input{Chapters/Toys/jobs/pull_table_645}
%\input{Chapters/Toys/jobs/pull_table_646}
%\input{Chapters/Toys/jobs/pull_table_647}
%\input{Chapters/Toys/jobs/pull_table_648}


It was discussed already in \refSec{parMeas-FS} that obtaining the $F_S$ parameter using the fit to the reconstructed \Kstarp mass is not possible due to the low statistical power of the sample. Hence, a dedicated test is done: the pseudoevents are generated with the $F_S = 0.25$ and with non-zero interference terms, but in the modeling of the pseudoevents both $F_S$ and the interference terms are fixed to zero.  The background is treated the same way as in the previous case: in the pseudoevents the generated shape \ctk follows a Chebyshev polynomial of order of five, but in the fit only order of two is allowed. The mean and width of the pulls obtained by studying 500 pseudoexperiments is listed in \refTab{toys-Sig-pull-631}. The parameters $S_3$, $S_4$, $S_7$, $S_8$ and $S_9$ show none or little bias and their statistical uncertainty is estimated well. On the other hand, parameters \FL, $S_5$ and \AFB show rather large biases, especially in bins between and above the charmonium resonances. Looking at \refTab{toys-fsig}, those two bins also have the highest background fraction. In these bins, the parameters show a tendency to run into the given parameter limit despite their value at generation being very far from it.

\input{Chapters/Toys/jobs/pull_table_631_col}


The folding technique is applied also in this pseudoexperimental setup. The results are given in \refTab{toys-Sig-pull-632}. The folding technique significantly reduces the bias in the \AFB parameter, the parameter \FL also can be reduced using the folding technique. Unfortunately, the folding technique decreases the bias in the parameter $S_5$ only minimally and  $S_5$ remains the only problematic parameter in the \qsq region below the \jpsi resonance. As the potential worst emerging bias from setting the \swave contribution to zero is up to 35\% of the statistical uncertainty, the \swave parameters can be omitted in the fit to data. It is worth noting that this assumes $F_S=0.25$ for each pseudoexperiment, which is the maximal observed value in any bin in~Ref.\,\cite{ANA-LHCb-angular4}: the actual value of $F_S$ can be very well below this value.

%\input{Chapters/Toys/jobs/pull_table_632_fld}
\input{Chapters/Toys/jobs/pull_table_632_fld_col}
%\input{Chapters/Toys/jobs/pull_table_633}
%\input{Chapters/Toys/jobs/pull_table_634}
%\input{Chapters/Toys/jobs/pull_table_635}
%\input{Chapters/Toys/jobs/pull_table_636}

Using the fits to the pseudoexperiments, an estimation of the statistical uncertainty in the fit to the data in the rare \BuToKstmm channel is done. The statistical uncertainty is estimated by fitting the statistical uncertainty distribution for the given parameter using 500 pseudoexperiments with a Gaussian function. The expected statistical uncertainty is the mean of this Gaussian distribution. In the case of \FL and \AFB, the error distribution deviates from a pure Gaussian distribution, as the parameters run into the limit, and a right tail is present. As the deviation is not large, the Gaussian distribution is used to describe also the statistical uncertainty for these two parameters. Together with the very narrow pull distributions due to the complicated shape of the background in \ctk in the sample with large statistical power (see \refTab{toys-Ref-pull}), using the Feldman-Cousins technique~\cite{FIT-Feldman} to ensure the correct coverage of the angular parameters will be necessary in the future measurements of this channel.\vspace{\baselineskip}

The expected statistical uncertainties of the angular parameters \allAng are given in \refTab{toys-error-estimate}. In order to put these absolute uncertainties into perspective, the Standard Model value for each parameter is given together with the expected uncertainty. The Standard Model values are obtained using the \flavio package. The standard statistical uncertainty is obtained from fits to pseudoexperiments using the folding technique. For the parameter \FL folding 1 is used, for the parameter $S_3$ folding 3 is used. This choice of the folding is made based on the results in \refTab{toys-Sig-pull-632}: these foldings have the smallest bias. For the rest of the parameters, the folding sensitive to the parameter is used. The pull distributions are shown in \refApp{toys-ref}. Comparing the uncertainty to the \BuToKstKspimm measurement of the angular moments~\cite{ANA-LHCb-angular4}, the only two comparable \qsq intervals are $\qsq\in[1.1,6.0]$ and $\qsq\in[15.0,18.0]$. In the interval below the \jpsi resonance, the expected statistical uncertainties in this work are up to two times larger, in the high \qsq interval the uncertainty is up to three times larger compared to the one in the \BuToKstKspimm measurement. Note that the statistical uncertainty in~Ref.\,\cite{ANA-LHCb-angular4} is estimated with the Feldman-Cousins technique, which returned larger statistical uncertainty values than HESSE.

Due to the large uncertainty of the parameter \FL and the potential bias of this parameter, the fit to the $\Pprime{i}$ basis, defined in \refEq{P'_definition}, has not been performed.

\begin{table}[hbt!]\small \centering
	\begin{tabular}{l|c c c c c}
	\textbf{parameter}	&[0.25--4.00]	&[4.00--8.00]	&[11.00--12.50]	&[15.00--18.00]	&[1.10--6.00]\\
	\hline
$\FL$	 & $\phantom{-}0.67 \pm 0.15$  & $\phantom{-}0.66 \pm 0.12$ & $\phantom{-}0.43 \pm 0.16$ & $\phantom{-}0.34 \pm 0.14$ & $\phantom{-}0.75 \pm 0.13$\\
$S_3$	 & $\phantom{-}0.00 \pm 0.15$  & $-0.03 \pm 0.16$ & $-0.09 \pm 0.23$ & $-0.19 \pm 0.19$ & $-0.01 \pm 0.15$\\
$S_4$	 & $-0.03 \pm 0.21$ & $-0.24 \pm 0.19$ & $-0.28 \pm 0.25$ & $-0.30 \pm 0.21$ & $-0.15 \pm 0.19$\\
$S_5$	 & $\phantom{-}0.04 \pm 0.19$  & $-0.37 \pm 0.19$ & $-0.41 \pm 0.26$ & $-0.30 \pm 0.19$ & $-0.19 \pm 0.19$\\
$\AFB$   & $-0.09 \pm 0.14$ & $\phantom{-}0.19 \pm 0.12$ & $\phantom{-}0.39 \pm 0.18$ & $\phantom{-}0.39 \pm 0.16$ & $\phantom{-}0.01 \pm 0.12$\\
$S_7$	 & $-0.02 \pm 0.19$ & $-0.01 \pm 0.19$ & $-0.00 \pm 0.26$ & $-0.00 \pm 0.22$ & $-0.02 \pm 0.19$\\
$S_8$	 & $-0.01 \pm 0.23$ & $-0.00 \pm 0.22$ & $\phantom{-}0.00 \pm 0.29$ & $\phantom{-}0.00 \pm 0.24$ & $-0.01 \pm 0.21$\\
$S_9$	 & $-0.00 \pm 0.15$ & $-0.00 \pm 0.16$ & $\phantom{-}0.00 \pm 0.23$ & $\phantom{-}0.00 \pm 0.19$ & $-0.00 \pm 0.15$\\
	\end{tabular}\captionof{table}[Expected standard statistical uncertainty in the fit to data.]{The Standard Model values of the angular parameters with their expected standard statistical uncertainty from the fit to the data. The Standard Model values are obtained using the \flavio package~\cite{ANA-flavio}. The standard statistical uncertainty is obtained from fits to pseudoexperiments using the folding technique. For the parameter \FL folding 1 is used, for the parameter $S_3$ folding 3 is used. For the rest of the parameters, the folding sensitive to the parameter is used. }\label{tab:toys-error-estimate}
\end{table}


%No folding
%$Fl$	 & $0.67 \pm 0.15$ & $0.66 \pm 0.12$ & $0.43 \pm 0.21$ & $0.34 \pm 0.17$ & $0.75 \pm 0.14$\\
%$S3$	 & $0.00 \pm 0.16$ & $-0.03 \pm 0.17$ & $-0.09 \pm 0.26$ & $-0.19 \pm 0.21$ & $-0.01 \pm 0.16$\\
%$S4$	 & $-0.03 \pm 0.21$ & $-0.24 \pm 0.20$ & $-0.28 \pm 0.30$ & $-0.30 \pm 0.25$ & $-0.15 \pm 0.20$\\
%$S5$	 & $0.04 \pm 0.19$ & $-0.37 \pm 0.20$ & $-0.41 \pm 0.29$ & $-0.30 \pm 0.23$ & $-0.19 \pm 0.19$\\
%$Afb$	 & $-0.09 \pm 0.15$ & $0.19 \pm 0.12$ & $0.39 \pm 0.20$ & $0.39 \pm 0.17$ & $0.01 \pm 0.13$\\
%$S7$	 & $-0.02 \pm 0.19$ & $-0.01 \pm 0.20$ & $-0.00 \pm 0.29$ & $-0.00 \pm 0.24$ & $-0.02 \pm 0.19$\\
%$S8$	 & $-0.01 \pm 0.22$ & $-0.00 \pm 0.20$ & $0.00 \pm 0.30$ & $0.00 \pm 0.25$ & $-0.01 \pm 0.20$\\
%$S9$	 & $-0.00 \pm 0.16$ & $-0.00 \pm 0.17$ & $0.00 \pm 0.26$ & $0.00 \pm 0.21$ & $-0.00 \pm 0.16$\\
%%%Folding 0
%$Fl$	 & $0.67 \pm 0.15$ & $0.66 \pm 0.12$ & $0.43 \pm 0.19$ & $0.34 \pm 0.16$ & $0.75 \pm 0.13$\\
%$S3$	 & $0.00 \pm 0.15$ & $-0.03 \pm 0.16$ & $-0.09 \pm 0.23$ & $-0.19 \pm 0.19$ & $-0.01 \pm 0.15$\\
%$Afb$	 & $-0.09 \pm 0.14$ & $0.19 \pm 0.12$ & $0.39 \pm 0.18$ & $0.39 \pm 0.16$ & $0.01 \pm 0.12$\\
%$S9$	 & $-0.00 \pm 0.15$ & $-0.00 \pm 0.16$ & $0.00 \pm 0.23$ & $0.00 \pm 0.19$ & $-0.00 \pm 0.15$\\
%
%
%
%%%Folding 1
%$Fl$	 & $0.67 \pm 0.15$ & $0.66 \pm 0.12$ & $0.43 \pm 0.16$ & $0.34 \pm 0.14$ & $0.75 \pm 0.13$\\
%$S3$	 & $0.00 \pm 0.15$ & $-0.03 \pm 0.16$ & $-0.09 \pm 0.23$ & $-0.19 \pm 0.19$ & $-0.01 \pm 0.15$\\
%$S4$	 & $-0.03 \pm 0.21$ & $-0.24 \pm 0.19$ & $-0.28 \pm 0.25$ & $-0.30 \pm 0.21$ & $-0.15 \pm 0.19$\\
%
%
%%%Folding 2
%$Fl$	 & $0.67 \pm 0.15$ & $0.66 \pm 0.12$ & $0.43 \pm 0.17$ & $0.34 \pm 0.14$ & $0.75 \pm 0.13$\\
%$S3$	 & $0.00 \pm 0.16$ & $-0.03 \pm 0.16$ & $-0.09 \pm 0.24$ & $-0.19 \pm 0.20$ & $-0.01 \pm 0.15$\\
%$S5$	 & $0.04 \pm 0.19$ & $-0.37 \pm 0.19$ & $-0.41 \pm 0.26$ & $-0.30 \pm 0.19$ & $-0.19 \pm 0.19$\\
%
%
%%%Folding 3
%$Fl$	 & $0.67 \pm 0.14$ & $0.66 \pm 0.12$ & $0.43 \pm 0.17$ & $0.34 \pm 0.14$ & $0.75 \pm 0.13$\\
%$S3$	 & $0.00 \pm 0.15$ & $-0.03 \pm 0.16$ & $-0.09 \pm 0.23$ & $-0.19 \pm 0.19$ & $-0.01 \pm 0.15$\\
%$S7$	 & $-0.02 \pm 0.19$ & $-0.01 \pm 0.19$ & $-0.00 \pm 0.26$ & $-0.00 \pm 0.22$ & $-0.02 \pm 0.19$\\
%
%
%%%Folding 4
%$Fl$	 & $0.67 \pm 0.15$ & $0.66 \pm 0.12$ & $0.43 \pm 0.17$ & $0.34 \pm 0.15$ & $0.75 \pm 0.14$\\
%$S3$	 & $0.00 \pm 0.14$ & $-0.03 \pm 0.15$ & $-0.09 \pm 0.23$ & $-0.19 \pm 0.19$ & $-0.01 \pm 0.13$\\
%$S8$	 & $-0.01 \pm 0.23$ & $-0.00 \pm 0.22$ & $0.00 \pm 0.29$ & $0.00 \pm 0.24$ & $-0.01 \pm 0.21$\\

The statistical uncertainty estimation can be used to obtain the expected sensitivity of the measurement of the real value of the vector coupling Re(\C9 ).
A likelihood scan as a function of Re(\C9 ) is performed. This is done using the \flavio package and shown in \refFig{toys-C9}. A pseudomeasurement is generated using the predictions for a New Physics model with Re(\C9 )=-2 as central value,
which is approximately the value preferred by the fit in~Ref.\,\cite{ANA-LHCb-angular4}.
The \qsq bins used are in the range $[1.1,6.0]$\gevgev and $[15.0,18.0]$\gevgev, as the predictions close to \jpsi are affected by the \ccbar loops (see \refSec{SM_bsll}). One unit on the $y$-axis $[-2\Delta\log\mathrm{L}]$ corresponds to one standard deviations squared. Assuming the value of Re(\C9 )=-2, the expected deviation from the Standard Model value is $\approx2.4$ standard deviations. It is important to stress this estimation is done only using the expected \emph{statistical} uncertainty on the parameters. Performing systematic studies especially related to the background shape can increase this uncertainty. Moreover, performing a Feldman-Cousin scan is necessary as the pull distributions show that the uncertainty estimation is volatile. The uncertainties obtained from the  Feldman-Cousins scan can be larger than the ones presented here.

\begin{figure}[hbt!] \vspace{-10pt}
	\centering
	\includegraphics[width=0.75\textwidth]{./FCNC/C9_sensitivity.pdf}
	\captionof{figure}[Likelihood scan of the shift of Re(\C9 ) assuming NP value.]{Likelihood scan of the shift of Re(\C9 ) from its SM value. The expected likelihood scan is represented by the dashed green line. The prediction is compared to the measured likelihood scan in the decay \BuToKstmmKSFull, given by the solid blue line.  The preditions are taken from the \flavio package~\cite{ANA-flavio}. }\label{fig:toys-C9}
\end{figure}


\clearpage