PhD-Kopecna-Renata/Chapters/TrackEff/measurement.tex

%\subsection{Track reconstruction at \lhcb}\label{sec:trackReco}

%In order to understand the procedure of estimating the track reconstruction efficiencies, iti is important to first understand the process of reconstructing tracks at the \lhcb detector. Tracks are reconstructed from \emph{charged} particles transversing the detector material. As they travel through the detector material and interact with it, they deposit a fraction of their energy in the detector. This energy is measured and referred to as a \emph{hit}. Hits represent the spatial and timing information of the trajectory of the particle. Exploiting this information, dedicated algorithms can reconstruct the trajectory of the particle: a \emph{track}.
%
%Any hit in the detector located at a given $z$-position is described by its time and position in the x-y plane and by the particle charge $q$ relative to its momentum $p$~\cite{LHCb-Performance}:
%%
%\begin{equation}\label{eq:track_vec}
%	\vec{x}\left(z\right) = \begin{pmatrix} x \\ y \\ t_x \\ t_y \\ q/p \end{pmatrix}\,,
%\end{equation}
%%
%and its the covariance matrix in order to reflect the correlation of uncertainties.
%
%Each hit position can be extrapolated to the next detector layer using a \emph{track-propagation function} $f$ as follows:
%%
%\begin{equation}\label{eq:track_propagation}
%\vec{x}\left(z_1\right) = f_{z_1\rightarrow z_2}\vec{x}\left(z_2\right)\,.	
%\end{equation}
%
%In the case of no magnetic field, $f$ takes the form of a simple propagation matrix
%%
%\begin{equation}\label{eq:track_propagation}
%f_{z_1\rightarrow z_2} =\begin{pmatrix} 
%1 & 0 & z_2-z_1 & 0 & 0 \\ 
%0 & 1 & 0 & z_2-z_1 & 0 \\
%0 & 0 & 1 & 0 & 0 \\
%0 & 0 & 0 & 1 & 0 \\ 
%0 & 0 & 0 & 0 & 1 
%\end{pmatrix}\,.
%\end{equation}
%
%However, the particles are passing a strong magnetic field and the function $f$ becomes not-trivial: it depends on $z_1$ and $z_2$ as the magnetic field is also present no just in the dipole magnet area, but also near the \ttracker and \Tstations. Moreover, multiple-scattering effects need to be taken into account by enlarging the uncertainties in the covariance matrix of each hit.
%
%\subsubsection{Kalman filters}\label{sec:trackReco-kalman}
%
%As the propagation matrix is non-linear, even with an exact propagation matrix available, calculating the particle's trajectory would require significant computational resources. Therefore, \emph{Kalman filter}\footnote{Also known as linear quadratic estimation.} is used~\cite{Kalman}. This technique is widely used in high-energy physics experiments as it eliminates the computation of matrix inversion, speeding up the computation immensely, while accounting for multiple scattering and other possible energetic losses.
%
%The Kalman filter is an iterative algorithm. It exploits the fact we know the propagation matrix: measured point $x_{k}$ and its error covariance matrix $P_k$ is propagated to $\hat{x}_{k+1}$ and $P_{k+1}$. If the particle interacts with the material, the search window is enlarged accordingly. The point at $x_{k+1}$ is also measured. This information is used to update $\hat{x}_{k}$ using \emph{filter equations}. These equations remove extreme outliers, speeding up the pattern recognition process. This is repeated until all hit information from all detectors is used. Then, the process is reversed: starting from the last updated point, the previous measurements are updated. This ensures full exploitation of the available information in a timely manner. Kalman filter then provides an estimate of the final momentum and also the $\chisq$ for given track, describing the track fit quality. 
%
%%http://bilgin.esme.org/BitsAndBytes/KalmanFilterforDummies


\subsection{Track reconstruction algorithms in \lhcb}\label{sec:trackReco-trackTypes}

As shown in \refSec{det_tracking_vertexing}, the \lhcb detector is designed with large gaps between the tracking detectors. The particles can escape or be created at any point in the active detector area, leaving hits only in some of the detectors. Hence there are many possible track types to be reconstructed. A sketch of the most used types at the \lhcb experiment is given in \refFig{trEff-track_types}. The tracks are reconstructed using algorithms that correspond to the different track types. These algorithms are independent and therefore a particle crossing the detector is typically reconstructed by several of these algorithms: for an example, a particle bent out of the \lhcb acceptance by the magnet can be reconstructed by the \emph{upstream} tracking algorithm as well as the \velo tracking algorithm.  

\begin{figure}[htbp]
	\begin{center}
		\subfloat[]{\includegraphics[width=0.5\textwidth]{TrackEff/trackTypesRunIAndII_Blue.pdf}}\\
		\captionof{figure}[The most common track types used at the \lhcb experiment.]{The most common track types used at the \lhcb experiment. The most valuable track type to the \lhcb physics analyses are \emph{long tracks} as they have the best momentum resolution. Long tracks have hits in all \lhcb tracking detectors.}
		\label{fig:trEff-track_types}
	\end{center}
\end{figure}


\begin{itemize}[leftmargin=*]
	\setlength{\itemindent}{0em}
	\item \textbf{Long track:} A track with origin in the \velo that also transverses the \Tstations. Hit information from the \ttracker \emph{can} be added, but it is not required. This type of track is the most common track type in \lhcb studies as it has the best momentum resolution.
	There are two independent algorithms used to obtain the long track: \emph{forward tracking} and \emph{matching}.
	The \emph{forward tracking} algorithm propagates \velo track's trajectory to the T stations, taking into account the bending of the trajectory by the magnet. In the \emph{matching} algorithm standalone T tracks are created and combined with \velo tracks, also taking the bending into account. The two algorithm results can be compared and a combined set of best long tracks is obtained. \ttracker hits are added only after finding  a track candidate from the \velo and \Tstation hits.
	
	\item \textbf{Velo track:} A track that only consists of hits in the \velo detector: they are independent of the forward tracking. These tracks are used for primary vertices reconstruction. 
	
	\item \textbf{Upstream track:} A track reconstructed using \velo and \ttracker hits. As these tracks have only poor momentum information, they are rarely used in  analyses. If no other algorithm reconstructs this track, it corresponds to a particle with low enough momentum so it is bent out of the \lhcb acceptance by the magnet. 
	
	\item \textbf{Downstream track:} A track reconstructed using \ttracker and T station hits. As there is no \velo information, the momentum resolution is worse compared to long tracks.  Long-lived particle decay products leave downstream tracks in the detector.
	
	\item \textbf{T track:} A track reconstructed only using hits in \Tstations.  Similarly to \velo tracks, there is no momentum information.  If no other algorithm reconstructs this track, it typically represents a very long-lived particle decay product.\\
\end{itemize}

The algorithms searching for hits from different detectors and combining them can sometimes combine hits that do not originate from the same particle. Such tracks, called \emph{ghost} tracks, contribute to the background. Most of the ghost tracks can be rejected by requiring a high track fit quality \chisq. However, this can also lead to rejection of real particle tracks, modifying the track reconstruction efficiency. To resolve this issue, a dedicated neural networks is trained in the \lhcb reconstruction software. This neural network is designed to remove most ghost tracks while minimizing the impact on real tracks. It returns a value between 0 and 1, a  "ghost probability", which is typically required to be below 0.4, corresponding to removal of more than 70\% of reconstructed ghost tracks with hardly any loss in efficiency.


\subsection{Determination of the track reconstruction efficiency}\label{sec:trackMeas}

In most analyses carried out by the \lhcb collaboration, the track reconstruction efficiency is estimated using a  Monte Carlo simulation. While the simulation is a very good representation of the real data,  it is not perfect. The main discrepancy between the real data and simulation is the detector occupancy distributions (see \refSec{sel-SimulationCorrection}), but there can be other imperfections in the kinematic variables as well. The track reconstruction efficiency depends mainly on the kinematic properties of the track (momentum, direction, and position in the detector) as well as on the occupancy of the detector. While the discrepancies between the data and the simulation in the kinematic and occupancy quantities can be corrected for, the track reconstruction efficiency also depends on the placement and the amount of dead channels, inactive materials and others. These effects are very hard to simulate or correct for in the Monte Carlo simulation. Hence, the track reconstruction efficiency obtained using purely a simulation sample is a good approximation, but it does not meet the required precision. Measurement using a data-driven technique is necessary. 

The track reconstruction efficiency measurement is executed using a data-driven tag-and-probe method exploiting the decay of \decay{\jpsi}{\mumu}. Tag-and-probe technique is widely used in high energy physics~\cite{tagAndProbe_CMS_1,tagAndProbe_CMS_2,tagAndProbe_ATLAS} to measure the efficiency of various processes, typically reconstruction or selection. The method exploits two-product decays of a well-know resonance. One of the decay products, the \Tag, is a well identified track, while the other, the \Probe, is an unbiased track. The \Probe track then either passes or fails the reconstruction or selection criteria for which the efficiency is to be measured. The ratio of track passing this criteria to all reconstructed unbiased tracks is the reconstruction or selection efficiency $\varepsilon$:
\begin{equation}\label{eq:tag-and-probe}
\varepsilon = \frac{N_{\text{passing criteria}}}{N_\text{{all unbiased}}}\,.
\end{equation}

The method used at the \lhcb experiment was developed during \runI and further advanced during \runII~\cite{TrackEffRun1}. It exploits the decay of \jpsi mesons to a muon pair. Recently, a new technique exploiting the decay of \decay{\jpsi}{\en\ep} has been developed~\cite{TrackEffElectrons}. However, the focus in this work is on the \jpsi\to\mup\mun decay only. In the following section, the tag-and-probe method used to measure the track reconstruction efficiency in \lhcb is explained in detail.

\subsubsection[Tag-and-probe technique using  \texorpdfstring{\JpsiTomm}{Jpsi to mu mu} decays]{Tag-and-probe technique using \texorpdfstring{\JpsiTommBF}{Jpsi to mu mu} decays}\label{sec:trackMeas-tag-and-probe}

The muon decay of \jpsi meson is used for  track reconstruction efficiency determination muons transverse the whole \lhcb detector region and leave hits also in the muon stations. Moreover, they do not interact hadronically. 

The \Tag track is a muon track reconstructed using standard long track reconstruction algorithm and passing a tight selection to make sure it is a decay product of a \jpsi resonance, such as a momentum requirement. The \Probe track is reconstructed using one of three dedicated algorithms designed in such a way that each probes one (or two) of the three tracking detectors of the \lhcb: the \velo, the \Tstations and the \ttracker. These algorithms are very loose in order to minimize the potential bias imposed by any selection on the final result. 

The criteria to determine whether the \Probe track is efficiently reconstructed or not is the existence of a long track that can be associated with the \Probe track. The matching is performed by checking the amount of common hits between the \Probe and the long track in the tracking detectors. The \emph{overlap fraction} is used as the association criterion. The overlap fraction is the the number of common hits $N_{\rm common}$ divided by the minimum number of hits in the subdetector required by the long track reconstruction algorithm $N_{\rm required}$:
\begin{equation}\label{eq:overlap}
{\rm overlap ~fraction} = \frac{N_{\rm common}}{N_{\rm required}}\,,
\end{equation}
%
Using the overlap fraction as the association criterion, the \refEq{tag-and-probe} then becomes
\begin{equation}\label{eq:tag-and-probe-real}
\varepsilon_{\rm tr} = \frac{N_{\rm assoc}(\jpsi\to\mup\mun)}{N_{\rm all}(\jpsi\to\mup\mun)}\,,
\end{equation}
%In the measurement of the efficiency combinatorial background is also present, therefore only \Probe tracks corresponding to the \JpsiTomm decay are considered. 
where $N_{\rm assoc}$ denotes the number of \Probe tracks passing the association criteria and $N_{\rm all}$ denotes all unbiased \Probe tracks.

Depending on the algorithm used to reconstruct the \Probe track, there are three methods to obtain the track reconstruction efficiency. The methods, illustrated in \refFig{trEff-probe_reco}, are:


\begin{itemize}[leftmargin=*]
	\setlength{\itemindent}{0em}
	\item \textbf{Long method}~~The \Probe track is reconstructed at first using muon station hits to create a standalone muon track. This track is then matched to the hits in the \ttracker. Note that the long track reconstruction algorithm described in \refSec{trackReco-trackTypes} does require hits neither in the muon stations nor the \ttracker. Hence, this method directly probes the track reconstruction efficiency of long tracks. The \Probe track is considered efficient, when the overlap fraction is at least 70\% for the muon stations and 60\% for the \ttracker. %The \ttracker hits are added by searching for them along the trajectory after the long track reconstruction.
				
	\item \textbf{\velo method}~~The \Probe track is reconstructed as a downstream track with added muon station hits in order to identify the particle as a muon. This method probes the \velo track reconstruction efficiency. The \Probe track is considered efficient when the overlap fraction in the \Tstations is at least 50\%.
	
	\item \textbf{\Tstation method}~~The \Probe track is reconstructed by a dedicated algorithm from hits in the \velo and the muon stations. This method probes the \Tstation track reconstruction efficiency. The \Probe track is considered efficient when there are at least two common hits in the muon stations and the same \velo segment as the \Probe signature. 		
\end{itemize}

\begin{figure}[htbp]\vspace{-1cm}
	\begin{center}
		\subfloat[]{\includegraphics[width=10cm]{TrackEff/trackEffMuonTT_tag}}\\
		\subfloat[]{\includegraphics[width=10cm]{TrackEff/trackEffDownstream_tag}}\\
		\subfloat[]{\includegraphics[width=10cm]{TrackEff/trackEffVeloMuon_tag}}\\
		\captionof{figure}[Illustration of the probe track reconstruction algorithms.]{Illustration of the probe track reconstruction algorithms: (a) Long method, (b) \velo method, and (c) \Tstation method~\cite{TrackEffRun1}.
		Red dots indicate the hit information used by each algorithm to select the \Probe track. 
		Solid blue line represent the trajectory of the \Tag (upper line) and \Probe (lower line) tracks. 
		The blue dotted line represents the sub-detector region which is probed by the respective method. 
		The dashed vertical line shows the bending plane of the magnet.
	}
		\label{fig:trEff-probe_reco}
	\end{center}
\end{figure}

Different methods probe different track reconstruction algorithms. When combining the \velo and \Tstation efficiency, a careful reader may notice this is equivalent to probing the long track reconstruction efficiency. This is further exploited by two methods:

\begin{itemize}[leftmargin=*]
	\setlength{\itemindent}{0em}
	\item \textbf{Combined method}~~This method represents the combination of \velo and \Tstation efficiencies. The efficiencies from these two methods are uncorrelated, with the exception of track kinematics and detector occupancy. The combined method efficiency is then simply
	\begin{equation}\label{eq:TrEff-combined}
		\varepsilon_{\rm Comb} = \varepsilon_{\rm \velo} \times \varepsilon_{\rm T station} \,.
	\end{equation}
	
	\item \textbf{Final method}~~The Final method is the weighted average of Long and Combined methods.  The weights are inverse squares of the uncertainty for each method, $w_{\rm Comb}=\sfrac{1}{\sigma^2_{\rm Comb}}$,  $w_{\rm Long}=\sfrac{1}{\sigma^2_{\rm Long}}$. The weighted average is then
	\begin{equation}\label{eq:TrEff-Final}
			\varepsilon_{\rm Final} = \frac{w_{\rm Comb} \varepsilon_{\rm Comb} + w_{\rm Long} \varepsilon_{\rm Long}}{w_{\rm Comb}+w_{\rm Long}} \,.
	\end{equation}
	The statistical uncertainty of the average is 
	\begin{equation}\label{eq:TrEff-Final-error}
		\sigma_{\rm Final} = \frac{1}{\sqrt{w_{\rm Comb}+ w_{\rm Long}}}\,.
	 \end{equation}
	This method represents the most precise track reconstruction efficiency at  \lhcb, as it exploits the information from all three available methods.
\end{itemize}

It is worth noting here that using muon tracks as the \Tag and \Probe does not take into account the material absorption effects on the track reconstruction efficiency. The track reconstruction efficiency presented here represents the probability of a particle crossing the full active detector area to be reconstructed. However, the hadronic interactions can be taken into account in the form of systematic uncertainty. The uncertainty is equal the fraction of hadrons that cannot be reconstructed due to hadronnic interactions multiplied by the material budget uncertainty. As the cross-section of the hadronic interaction depends on the given particle, this has to be evaluated  separately for each hadron. Moreover, it also depends on the momentum of the hadron: the fraction of hadrons that cannot be reconstructed but to the hadronic interactions can be estimated for each process using the simulation. 

\subsubsection{Efficiency evaluation}\label{trackMeas-eval}

When selecting the \jpsi candidates, there is a contribution from random combinations of real or even fake muon tracks. The number of signal \jpsi\to\mup\mun events $N_{\rm all}$, as well as the number of associated events $N_{\rm assoc}$ in \refEq{tag-and-probe-real} needs to be extracted from a mass fit to the \jpsi candidate. The \jpsi candidate's mass is calculated as
\begin{equation}
	m_{\rm rec} = \sqrt{ \left( E_{\rm tag} + E_{\rm probe}\right)^2 - 
	\left(  \vec{p}_{\rm tag} + \vec{p}_{\rm probe}\right)^2
	}\,,
\end{equation}
where $E$ is energy of the \Tag or \Probe track and $\vec{p}$ is its momentum. 

The reconstructed \jpsi candidates are split into two sets. \emph{Matched} candidates fulfill the association criteria, \emph{failed} candidates do not fulfill the association criteria. These two sets are simultaneously fitted with a sum of two Crystal Ball functions with shared mean value as the signal component and an exponential function as the background component. The two sets share all signal shape parameters. The yields and the background shape in the \matched and \failed sets are independent. Following \refEq{tag-and-probe-real}, the track reconstruction efficiency can be expressed as a function of the signal yields of the \matched, $N_{\rm sig}^{\rm Match}$, and \failed, $N_{\rm sig}^{\rm Fail}$, samples
\begin{equation}\label{eq:tag-and-probe-eff}
	\varepsilon_{\rm tr} = \frac{N_{\rm sig}^{\rm Match}}{N_{\rm sig}^{\rm Match}+N_{\rm sig}^{\rm Fail}}\,.
\end{equation}
It is important to consider only the \emph{signal} yields, as combinatorial background is present in the measurement. An example of the mass distributions for the matched and failed candidates is given in \refFig{trEff-mass}.

In this approach, the efficiency is treated as a fit parameter\footnote{In the fit  to the \jpsi mass, the floated parameters are then the track reconstruction efficiency and the total yield of $N_{\rm sig}<{all}=N_{\rm sig}^{\rm Match}+N_{\rm sig}^{\rm Fail}$.}. This scheme guarantees that the correlations between the parameters are properly treated in the calculation of the statistical uncertainties. 


\begin{figure}[hbt!]
	\centering
	\includegraphics[width=0.45\textwidth]{TrackEff/2018/Full_bin0_Long_Matched.eps}
	\includegraphics[width=0.45\textwidth]{TrackEff/2018/Full_bin0_Long_Failed.eps}
	
	\captionof{figure}[Invariant reconstructed \jpsi candidate mass distribution.]{Invariant reconstructed \jpsi candidate mass distribution for the Long method. On the left, the invariant mass of the matched candidates is shown, on the right are failed candidates. The black points represent a subset of the data obtained during the data taking year 2018. The black line represents the full fit model. The red dashed line represents the signal component.}\label{fig:trEff-mass} 
\end{figure}

As discussed in \refSec{trackMeas}, the efficiencies obtained from the simulation are a good approximation of the actual efficiencies, however they are not perfect. On the other hand, the tag-and-probe method of obtaining the efficiencies is rather lengthy and performing this study for every analysis is not feasible. Therefore, a correction factor $R$ is calculated:
\begin{equation}\label{eq:trackEff-R}
R=\frac{\varepsilon_{\rm data}}{\varepsilon_{\rm sim}}\,,
\end{equation}
where the $\varepsilon_{\rm data}$ represents the tag-and-probe efficiency in data and $\varepsilon_{\rm sim}$ represents the tag-and-probe efficiency in simulation. This ratio $R$ can be used independently by many analyses to 'correct' the efficiency obtained directly from simulation. Moreover, first order uncertainties are canceled out. Therefore, the ratio $R$ is the ultimate goal of the tracking efficiency measurement.

In order to accomodate the requirements of most \lhcb analyses, the track reconstruction efficiency is measured in bins of pseudorapidity, $\eta$, momentum, $p$, number of hits in the \spd detector, $N_{\spd \rm hits}$, and the number of primary vertices present in the event $N_{\rm PV}$. The ratio $R$ is measured in two dimensions in bins of pseudorapidity and momentum. This is referred to later as \emph{correction table}.