\section{Parameter measurement}\label{sec:parMeas}

The angular moments \allAng described in \refEq{decay_rate_final} are extracted from the selected events using a \cpp based fitter framework \fcncfitter. It performs maximum-likelihood fits using the \tminuit~\cite{FIT-TMINUIT} minimization class of the \root framework~\cite{ROOT}\footnote{The discrepancy between maximum likelihood and its minization will be explained in the following section.}. This framework has been developed in the \lhcb collaboration and successfully used in previous analyses described in~Ref.\,\cite{ANA-LHCb-angular3,ANA-LHCb-angular4,ANA-LHCb-angular2}. The framework is further developed to accommodate for the needs of this particular analysis and to improve the user experience. 

\subsection{Maximum likelihood}\label{sec:parMeas-maximumLikelihood}

Maximum-likelihood is a method of estimating the parameters of a probability distribution. As the name suggests, the method maximizes the likelihood function in a way that the assumed statistical model of the observed data is most probable. 

In order to fully understand the concept of maximum-likelihood in the multi-dimensional fit, it is useful to start with a simple example of a one parameter fit and expand it to the multi-dimensional space.

%Let $\{x_1,\dots,x_N\}$ be a set of random phenomenon and $x$ a random variable $x\in\{x_1,\dots,x_N\}$ 

Let $h$ represent a hypothesis, $\{D\}={x_1,x_2,...,x_n}$ measured data. Then, using bayes theroem~\cite{FIT-bayes}, the probability density function PDF representing that the hypothesis is valid given the data $\{D\}$ can be rewritten as
%
\begin{equation}\label{eq:likelihood-simple}
	\text{PDF}(h\,|\,\{D\}) = \frac
		{\text{PDF}( \{D\}\,|\,h)\,\text{PDF}(h)}
		{\text{PDF}(\{D\})}\,.
\end{equation}
 
The first part of the numerator, $\text{PDF}( \{D\}\,|\,h)$, is refereed to as \emph{likelihood}, the second part, $\text{PDF}(h)$, as \emph{prior}. The prior is assumed to be a uniform distribution. The numerator represents the probability of the data averaged over all parameters. As $\text{PDF}(\{D\})$ is a constant that is not relevant for maximizing the probability that the hypothesis is true. 

As the goal is to maximize the probability $\text{PDF}(h\,|\,\{D\})$, the first order derivative at observable $\mu_0$ has to be zero and the second order derivative positive:
%
\begin{equation}\label{eq:max-deriv}
\left.\frac{\partial \text{PDF}}{\partial h}\right|_{\mu_0} = 0\,, \qquad
\left.\frac{\partial^2 \text{PDF}}{\partial h^2}\right|_{\mu_0} > 0\,.
\end{equation}

%To illustrate this principle, in the case of a Gaussian distribution, the maximum likelihood estimators for the mean $\mu$ and the variance $\sigma$ then are $\hat{\mu}$ and $\hat{\sigma}$
%
%\begin{displaymath}
%	\hat{\mu} = \frac{1}{n}\sum_{i=1}^n x_i	\,, \qquad
%	\hat{\sigma} = \frac{1}{n}\sum_{i=1}^n (x_i-\hat{\mu})^2\,,
%\end{displaymath}
%
%as the likelihood maximization requires the first order likelihood derivatives to be zero:
%	
%\begin{displaymath}
%	\frac{\partial \mathcal{L} }{\partial \mu} = 0  =  \frac{1}{\sigma^2}\sum_{i=1}^n (x_i-\hat{\mu})\,,\\	
%\end{displaymath} 
%%
%\begin{displaymath}
%	\frac{\partial \mathcal{L} }{\partial \sigma^2} = 0 
%	= \frac{1}{2\sigma^2}
%	\left[
%			\frac{1}{\sigma^2}\sum_{i=1}^n (x_i-\hat{\mu})^2-n
%\right]\,.	
%\end{displaymath}


In this work, an unbinned maximum likelihood fit is performed. Extending the previous to a multidimensional space, let $N$ be the number of events,  each assigned a weight $w_n$. Let $\vec{\mu}$ be the parameter vector and $\{\vec{D}\} = {\vec{x_1},\vec{x_2},...,\vec{x_n}}$ measured data. The likelihood takes the form of
%
\begin{equation}\label{eq:likelihood}
	\mathcal{L}(\{\vec{D}\}|\vec{\mu}) = \text{PDF}(\{\vec{D}\}|\vec{\mu}) =   \prod_{n=1}^{N} \text{PDF}(\vec{D_n}|\vec{\mu})^{w_n}\,,	
\end{equation}
%
where $\text{PDF}(\vec{D_n}|\vec{\mu})$ is the normalized probability density function according to which the data is distributed. %In the unbinned fit, the weights for each event $w_n$ are often not normalized. A factor of $\sfrac{N}{\sum_n w_n}$ then has to be added to the weights in order to extract an unbiased overall uncertainty.

The maximization problem is often reduced to a much simpler problem. Instead of maximizing the likelihood (for simplicity denoted $\mathcal{L}$) itself, it is possible to \emph{minimize} a negative logarithm of the likelihood $-\ln\mathcal{L}$. Looking at \refEq{likelihood}, the minimization problem becomes:
%
\begin{equation}
-\ln(\mathcal{L}(\{D\}|\vec{\mu})) \propto  -\sum_{n=1}^N w_n \ln \left(\text{PDF}(\vec{D_n}|\vec{\mu})\right)\,.
\end{equation}
%

In the unbinned fit, the weights for each event $w_n$ are often not normalized. A factor of $\sfrac{N}{\sum_n w_n}$ then has to be added to the logarithm of the likelihood in order to extract an unbiased uncertainty.

Some of the parameters can be constrained to previously known value $v$ with some uncertainty $u$. For every such parameter $p_i$, an additional term is added to the likelihood
%
\begin{equation}
\ln \mathcal{L}_{\text{constr}} = \sum_j \left( \frac{p_i - v_i}{u_i} \right)\,.
\end{equation}


%
%In this particular analysis, the data $\vec{D_n}$ corresponds to the \Bu meson mass and the deay angles, $\vec{X} = (\mass{\Bu},$\angles$)$. The parameter vector is then $\vec{\mu} = \vec{\mu_S},\vec{\mu_P},\vec{\mu_{n}}$, $\mu_n$ being a nuisance parameter

The biggest advantage of minimizing the negative logarithm of likelihood instead of maximizing the likelihood directly is that the logarithm can be expanded using the Taylor expansion~\cite{FIT-taylor} in the maximum likelihood estimator $\vec{\mu_0}$:
%
\begin{displaymath}
	\ln(\mathcal{L}(\vec{\mu})) = \mathcal{L}(\vec{\mu_0}) 
	+ \left.\frac{\partial \mathcal{L} }{\partial \vec{\mu}}\right|_{\vec{\mu_0}} (\vec{\mu}-\vec{\mu_0})
	+ \left.\frac{\partial^2 \mathcal{L} }{\partial \vec{\mu}^2}\right|_{\vec{\mu_0}}
	\frac{(\vec{\mu}-\vec{\mu_0})^2}{2} + \omega_3\,,
\end{displaymath}
%
where the  $\omega_3$ denotes the higher order contributions, which are \emph{typically} negligible.

The first element in the expansion is a constant, therefore not interesting for the minimization. The second element is equal to zero from \refEq{max-deriv}. Therefore, the PDF that has to be minimized, denoted for simplicity $\mathcal{P}$, becomes
%
\begin{equation}
    \mathcal{P} = -\left.\frac{\partial^2 \mathcal{L} }{\partial \vec{\mu}^2}\right|_{\vec{\mu_0}}
    \frac{(\vec{\mu}-\vec{\mu_0})^2}{2}\,.
\end{equation}


\subsection{Fit model}\label{sec:parMeas-fitModel}

The \fcncfitter framework offers a wide variety of fit models. Despite its versatility, further improvements are made in order to adapt for this analysis, especially due to the limited \ctk availability and the presence of a complicated background component. The fit model used to extract the parameters \allAng consists of two main components: signal and background probability density functions $P_{sig}$ and $P_{bkg}$. The PDF then can be generally described using signal and background probability density functions:\vspace{-0.25\baselineskip}
%
\begin{equation}
\text{PDF} = f_{sig} \prod_{i=1}^{D} \text{P}_{sig}^i + (1- f_{sig}) \prod_{i=1}^{D} \text{P}_{bkg}^i\,,
\end{equation}
where the $D$ represents the dimension of the fit and $f_{sig}$ is the fraction of signal candidates in the dataset to all candidates.

The fit is performed in four dimensions of \Bu meson mass, \ctl, \ctk and $\phi$. In addition in the case of the \BuToKstJpsi decay, in order to extract the $F_S$ parameter (see \refEq{decay_rate_final}) another fit is performed in two dimension of \Bu meson mass and \Kstar mass. 

%In the case of the \BuToKstmm decay, the $F_S$ fraction is fixed to zero as the fit to the \Bu meson mass and \Kstar mass prefers a zero value. It is shown that due to the low amount of signal candidates in the sample, this assumption does not significantly bias the \pwave angular parameters. In both of these fits, the \Bu meson mass is used to determine the fraction of signal events $f_{sig}$. 
 
As both the collision and the detector conditions differ between \runI and \runII, the datasets are treated separately. However, all angular observables noted in \refEq{decay_rate_final} are independent of those conditions. Hence these parameters are shared between the two datasets in the fit. Moreover, to further stabilize the fit, the angular background parameters are also shared between the two datasets.


\subsubsection{Signal component}\label{sec:parMeas-sig}

The reconstructed \Bu meson mass is described by a double-sided Crystal Ball function defined in \refApp{CrystalBall}. The parameters of the Crystal Ball function $\alpha_{1,2}$ and $n_{1,2}$ are fixed to the \Bu meson  mass shape in the simulation of the reference channel decay \BuToKstJpsi. This is due to the shape of the tails of the Crystal Ball function: even when fitting the simulated \Bu meson mass sample of the rare channel decay \BuToKstmm, the parameters $\alpha_{1,2}$ and $n_{1,2}$ show large uncertainties and the fit becomes unstable. On top of this, the mean of the \Bu meson mass peak in the rare \BuToKstmm decay is fixed to the one obtained by fitting the reference decay channel data.  

Due to low statistics of the signal sample, the width of the Crystal Ball function $\sigma_{\text{rare}}^{Data}$ is fixed to the width in the reference decay channel data fit $\sigma_{\text{ref}}^{Data}$ multiplied by a scaling factor obtained by fitting the simulated \Bu meson mass in the rare and reference decay channels, $\sigma_{\text{rare}}^{MC}$ and $\sigma_{\text{ref}}^{MC}$
%
\begin{equation}
	\sigma_{\text{rare}}^{Data} = \sigma_{\text{ref}}^{Data} \frac{\sigma_{\text{rare}}^{MC}}{\sigma_{\text{ref}}^{MC}}\,.
\end{equation}


For the fit of the \Kstar mass, the \pwave amplitude is described by Breit-Wigner model~\cite{FIT-BreitWigner}:
%
\begin{equation}\label{eq:BW-Pwave-basic}
	\begin{aligned}
	\mathcal{A}_{\rm P}\left(\mKpPiz \right) = 	
		&\sqrt{kp}\times B^{\prime}_{L_\PB}(k, k_0, d)
			\left(\frac{k}{\mBu}\right)^{L_\PB}\times 			B^{\prime}_{L_\Kstarp}(p, p_0,d)
			\left(\frac{p}{\mKstarp}\right)^{L_\Kstarp}\\
		&\times\frac{1}{\mKstarp^2 - \left(\mKpPiz \right)^2 - \textit{i}\mKstarp\Gamma\left(\mKpPiz\right)}\,.
	\end{aligned}
\end{equation}
%
The momentum of $\Kstarp$ in the rest-frame of \Bu meson meson is denoted $k$ with a mean peak value of $k_0$, the momentum of $\Kp$ in the rest frame of \Kstarp is denoted $p$ with a mean peak value of $p_0$. $L$ denotes the angular momentum of the corresponding meson. The factors ${B_L}^\prime$ are so-called Blatt-Weisskopf form-factors~\cite{FIT-BlattWeisskopf}
%
\begin{eqnarray}
\begin{aligned}
B_{0}^{\prime} \left(p, p_{0}, d\right) = 
&1\,,\\
B_{1}^{\prime} \left(p, p_{0}, d\right) = 
&\sqrt{\frac{1 + \left(p_{0}d \right)^{2}} { 1 + \left(p~d \right)^{2}} }\,,
\label{eq:blatt-weisskopf}
\end{aligned}
\end{eqnarray}
where $d$ represents the size of the decaying particle. This parameter is reported in~Ref.\,\cite{FIT-dSize} to be $d = 1.6 \gev^{-1}$ (or 0.3\fm). This is also consistent with a $\Bu\to\jpsi\rhop$ branching fraction measurement, where the fit favored $d=1.64\gev^{-1}$~\cite{FIT-JpsiRho}. However, recent \lhcb study of $Z(4430)$ favored $d\sim0$~\cite{FIT-Z}. As the determination of the $d$ parameter is not possible in this analysis, the value  is fixed to $d = 1.6 \gev^{-1}$. This is also consistent with previous \Bd decay analyses~\cite{ANA-LHCb-angular3,ANA-LHCb-angular2} and the \BuToKstKspimm decay analysis~\cite{ANA-LHCb-angular4}. 

As the angular momentum of the \pwave is $L_B =0$ and $L_\Kstar=1$, the \refEq{BW-Pwave-basic} becomes
%
\begin{equation}\label{eq:BW-Pwave}
\mathcal{A}_{\rm P}(\mKpPiz) = \sqrt{kp}\times \sqrt{\frac{1 + \left(p_{0}d \right)^{2}} { 1 + \left(pd \right)^{2}} } \times\frac{p}{\mKstarp}\times\frac{1}{\mKstarp^2 - \mKpPiz^2 - \textit{i}\mKstarp\Gamma(\mKpPiz)}\,.
\end{equation}


For the description of the \swave in the \mKpPiz, the \lassc parametrization~\cite{FIT-LASS} is used \vspace{-0.5\baselineskip}
%
\begin{eqnarray}\label{eq:BW-Swave-basic}
\begin{aligned}
	\mathcal{A}_{\rm S}(\mKpPiz) = &\sqrt{kp}\times B^{\prime}_{L_\PB}(k, k_0, d)\left(\frac{k}{\mBu}\right)^{L_\PB}\times B^{\prime}_{L_\Kstarp}(p, p_0, d)\left(\frac{p}{\mKstarp}\right)^{L_\Kstarp}\\
		&\qquad\times\left(\frac{1}{\cot \delta_\PB - \textit{i}}+\textit{e}^{2\textit{i}\delta_\PB}\frac{1}{\cot \delta_R - \textit{i}}\right)\,,\\
	\cot \delta_\PB = &\frac{1}{ap}+\frac{1}{2}rp\,,\\
	\cot \delta_R = &\frac{\mKstarp^2-\mKpPiz^2}{\mKstarp\Gamma(\mKpPiz)}\,.
\end{aligned}
\end{eqnarray}
%
The parameter $a$ represents the scattering length and $r$ is the effective range parameter. Their values $a = 1.95$ and $r = 1.78$ are taken from~Ref.\,\cite{FIT-Dunwoodie}. A study about the influence of these parameters is done in~Ref.\,\cite{ANA-LHCb-angular3}. The impact of varying these two parameters on the angular observables is negligible. 

For the \swave, angular momenta is $L_B = 0$ and $L_\Kstarp = 0$. This simplifies \refEq{BW-Swave-basic} to
%
\begin{equation}\label{eq:BW-Swave}
\begin{aligned}
	\mathcal{A}_{\rm S}(\mKpPiz) = &\sqrt{kp}\times 
	  \sqrt{\frac{1 + \left(k_{0}d \right)^{2}} { 1 + \left(k~d \right)^{2}} } \times 
	  \frac{k}{\mBu} \times \left(\frac{1}{\cot \delta_\PB - \textit{i}} +
	      \textit{e}^{2\textit{i}\delta_\PB}\frac{1}{\cot \delta_R - \textit{i}}\right)\,.
\end{aligned}
\end{equation}
%
The final amplitude in the \mKpPiz dimension is then a combination of the squared normalized P- and \swave amplitudes, using the \swave fraction $F_S$
%
\begin{equation}
	\left. \frac{{\rm d}\Gamma}{{\rm d}\mKpPiz} \right |_{\rm S+P}  = 
	(1-{F_S})\left\vert \mathcal{A}^{\prime}_{\rm P}(\mKpPiz)\right\vert^2+F_S \left\vert \mathcal{A}^{\prime}_{\rm S}(\mKpPiz)\right\vert^2\,.
\end{equation}


\subsubsection{Background component}\label{sec:parMeas-bkg}


As the background contribution is high especially at large \ctk, as discussed in \refSec{Accept-parametrizaiton}, a dedicated study on a predominantly background data sample is done. This sample consists of all events passing the selection described in \refSec{sel-EventSelection} in the resonant \jpsi dimuon invariant mass squared interval with the reconstructed \Bu meson mass higher than $5629\mev$, corresponding to the mass of a \Bu meson $+350\mev$. This rather strict cut is applied in order to make sure the signal tail does not significantly contribute to the background sample. 

In the \Bu meson mass dimension, the background mostly consists of random accidental track combinations and is described by an exponential with one free parameter. Similarly, in the case of \mKpPiz, a linear model describes this combinatorial background well. The fit projections of the \Bu and \Kstar reconstructed mass distributions are depicted in \refFig{FIT-bkgMass}.

\begin{figure}[hbt!]
	\centering
	\includegraphics[width=0.47\textwidth]{./FCNC/BkgFit/m_Bckgnd_JpsiFit_1BIN_bin0_Run12_HighBmass.eps}
	\includegraphics[width=0.47\textwidth]{./FCNC/BkgFit/mkpi_Bckgnd_JpsiFit_1BIN_bin0_Run12_HighBmass.eps} 
	\captionof{figure}[Fit to the background sample in the \Bu meson and \Kstar mass dimensions.]{Fit to the background sample in the \Bu meson and \Kstar mass dimensions.  The black markers represent the data, the red area represents the background fit model. The reconstructed \Bu meson mass distribution (left) is fitted with an exponential function, the \Kstar meson mass distribution (right) with a linear function. } \label{fig:FIT-bkgMass}
\end{figure}


In the dimension of the decay angles \angleDist, the background is parameterized using Chebyshev polynomials $T_i$~\cite{FIT-Chebyshev}. The background in each dimension is described by a dedicated Chebyshev polynomial. This factorization is possible, as a study of the background sample shows no correlation between the angles, as shown in \refFig{FIT-bkgCorr}.


The \ctl angular background is modeled with a polynomial of order two, \ctk is modeled with a polynomial of order five in the reference channel and order of two in the signal channel (this is explained in the next paragraph), and $\phi$ angular background is flat. The angular background is then described by \refEq{bkg-angular}:
\begin{equation}\label{eq:bkg-angular}
\left.\frac{\deriv(\Gamma+\bar{\Gamma})}{\deriv\ctk\,\deriv\ctl\,\deriv\phi}\right\vert_{\mathrm{BKG}}= 
\left( \sum_{i = 0}^{5(2)} c^{\thetak}_iT_i(\ctk) \right) \times
\left( \sum_{j = 0}^{2} c^{\thetal}_jT_j(\ctl) \right) \times
\left( c^{\phi}_0T_0(\phi) \right)\,.
\end{equation}

\begin{figure}[hbt!]\vspace{0.5\baselineskip}
	\centering
	\includegraphics[width=0.33\textwidth]{./FCNC/BkgFit/Background_Correlation_ctl_ctk_Bckgnd_JpsiFit_1BIN_bin0_Run12_HighBmass_Kpi.eps} \hspace{-5pt}	
	\includegraphics[width=0.33\textwidth]{./FCNC/BkgFit/Background_Correlation_ctk_phi_Bckgnd_JpsiFit_1BIN_bin0_Run12_HighBmass_Kpi.eps} \hspace{-5pt}	
	\includegraphics[width=0.33\textwidth]{./FCNC/BkgFit/Background_Correlation_ctl_phi_Bckgnd_JpsiFit_1BIN_bin0_Run12_HighBmass_Kpi.eps} \hspace{-5pt}	
	\captionof{figure}[Background factorization.]{Correlation between the decay angles in predominantly background sample. The correlation between the angular coefficients is in the order of couple of percent, proving there is no correlation. Hence three independent Chebyshev polynomials can be used for the description of the angular background.} \label{fig:FIT-bkgCorr}
\end{figure}


 The fit projections to the decay angles are shown in \refFig{FIT-bkgAngles}. The crucial part of the description is the peaking structure at high \ctk values. Correct modeling of this peak is crucial for extracting the angular moments. The lowest possible order of the Chebyshev polynomial describing the \ctk background well is the order of five. However, even when considering the large statistical sample of the \BuToKstJpsi decay, the five free parameters tend to overfit the data. The \chisq of the fit to the background sample in the \ctk is equal to only 0.346. This can be avoided either by cutting even harder on the high \ctk or by describing the background by a lower order Chebyshev polynomial. Cutting away more events with high \ctk leads to lower sensitivity to the angular parameters, especially the parameter \FL. Lower order polynomial does not describe the shape of the background well, especially in the regions at $\ctk\approx-1$ and $\ctk\approx0.8$. This problem disappears when considering the low statistical power of the rare channel: Chebyshev polynomial of order of two is sufficient to describe the background contribution. The overfitting is present only in the reference \BuToKstJpsi decay and manifested by the third order polynomial parameter running into the boundary of this parameter\footnote{Of course this can be avoided by enlarging the range of this free parameter. However, this lead to other parameters running into the boundary.}.  This parameter controls the shape of the plateau at $\ctk\approx-0.4$.  As this is a nuisance parameter and a wide range of values describes the background well, this parameter is left floating in the fit to the reference \BuToKstJpsi decay.
   

\begin{figure}[hbt!]
	\centering
	\includegraphics[width=0.32\textwidth]{./FCNC/BkgFit/ctk_Bckgnd_JpsiFit_1BIN_bin0_Run12_HighBmass.eps} \hspace{-5pt}	
	\includegraphics[width=0.32\textwidth]{./FCNC/BkgFit/ctl_Bckgnd_JpsiFit_1BIN_bin0_Run12_HighBmass.eps} \hspace{-5pt}	
	\includegraphics[width=0.32\textwidth]{./FCNC/BkgFit/phi_Bckgnd_JpsiFit_1BIN_bin0_Run12_HighBmass.eps} 	
	\captionof{figure}[Angular fit of the background sample.]{Angular fit of the background sample. The sample is obtained from the \BuToKstJpsi data taken during both \runI and \runII. The black markers represent the data, the red area represents the background model described by \refEq{bkg-angular}. On the left, the \ctk distribution is presented, in the middle the \ctl distribution, and on the right $\phi$ distribution is shown.} \label{fig:FIT-bkgAngles}
\end{figure}

\clearpage

\subsection[Extraction of the \texorpdfstring{${F_S}$}{Fs} parameter]{Extraction of the \texorpdfstring{$\mathbf{F_S}$}{Fs} parameter}\label{sec:parMeas-FS}

As mentioned in \refSec{ANA_Theo_SWave}, it is impossible to distinguish the contribution of  the \pwave and the \swave in the $\Kp\piz$ system at the selection level. However, using the reconstructed mass of \Kstar meson, statistical selection is possible. This can be done by performing a two-dimensional fit in the \Kstar meson mass and the \Bu meson mass. First, the 2D fit is performed in the reference \BuToKstJpsi channel, as it is much more abundant than the rare \BuToKstmm channel. The projections in the \mBu and \mKpPiz dimensions are shown in \refFig{Reference-mass-fit}. 

\begin{figure}[hbt!]
	\centering
	\includegraphics[width=0.48\textwidth]{FCNC/MassFit/m_massfit_JpsiFit_1BIN_bin0_Run1.eps}
	\includegraphics[width=0.48\textwidth]{FCNC/MassFit/mkpi_massfit_JpsiFit_1BIN_bin0_Run1.eps}\\
	\includegraphics[width=0.48\textwidth]{FCNC/MassFit/m_massfit_JpsiFit_1BIN_bin0_Run2.eps}
	\includegraphics[width=0.48\textwidth]{FCNC/MassFit/mkpi_massfit_JpsiFit_1BIN_bin0_Run2.eps}\\
	\captionof{figure}[Fit projections to the reference decay channel in \Bu and \Kstar masses.]{Fit projections to the reference decay channel in reconstructed \Bu meson (left) and \Kstar meson (right) masses. The top row represents \runI sample, the bottom row \runII sample. The black markers represent the data, the blue space represents the signal. Red surface represents the background contribution. The green dashed line represents the \pwave, the green dotted line, present under the background area, represents the \swave contribution.}\label{fig:Reference-mass-fit}
\end{figure}
 
 \clearpage
 
 The study of \BuToKstmm is performed in multiple \qsq bins. Looking at \refEq{BW-Pwave} and \refEq{BW-Swave}, the distributions are \qsq independent. Despite the effort to fit the $F_S$ in the \BuToKstmm channel using the full \qsq range (excluding the resonance regions), the statistical power of the sample is not large enough to find a contribution of the \swave in the $\Kp\piz$ system and the fit prefers $F_S=0$. The fit  projections in the \mBu and \mKpPiz dimensions are given in \refFig{Signal-mass-fit}.
 
A dedicated study of pseudoexperiments is done in order to establish the sensitivity of the fit on the $F_S$ parameter in the rare \BuToKstmm channel. It is shown in \refSec{toy-sig}  that fixing $F_S$ and the interference terms $S_{Si}$ to zero does not introduce bias on the measured \pwave angular parameters. 


\begin{figure}[hbt!]
	\centering
	\includegraphics[width=0.48\textwidth]{FCNC/MassFit/m_massfit_SignalFit_1BIN_bin0_Run1.eps}
	\includegraphics[width=0.48\textwidth]{FCNC/MassFit/mkpi_massfit_SignalFit_1BIN_bin0_Run1.eps}\\
	\includegraphics[width=0.48\textwidth]{FCNC/MassFit/m_massfit_SignalFit_1BIN_bin0_Run2.eps}
	\includegraphics[width=0.48\textwidth]{FCNC/MassFit/mkpi_massfit_SignalFit_1BIN_bin0_Run2.eps}\\
	\captionof{figure}[Fit projections to the rare channel in \Bu and \Kstar masses.]{Fit projections to the signal channel in reconstructed \Bu meson (left) and \Kstar meosn (right) masses. Top row represents \runI sample, bottom row \runII sample. The black markers represent the data, the blue space represents the signal. Red surface represents the background contribution. As the fit prefers $F_S=0$, the \swave contribution is not plotted. The green dashed line represents the \pwave contribution to the \Kstar mass.}\label{fig:Signal-mass-fit}
\end{figure}


\clearpage