Dissertation/commissioning.tex
2022-04-07 11:18:28 +02:00

1653 lines
120 KiB
TeX

\chapter{Commissioning of the SciFi Front-End Electronics}
\label{ch:commissioning}
The commissioning of the SciFi front-end electronics is performed along with the assembly and testing of the \cframes{} (see \cref{sec:c-frames}).
It began in 2019 and, after multiple interruptions in the course of the \mbox{COVID-19} pandemic, is planned to be concluded by spring 2022.
The assembly takes place in building 3852 at CERN LHC Point 8, which is the site housing the LHCb experiment and its infrastructure.
The location of the building, in the following referred to as (SciFi) assembly hall, is shown on the satellite image of the site in \vref{fig:point8}.
\begin{figure}
\centering
\includegraphics[width=1.0\textwidth]{figures/commissioning/Point8}
\caption{
Location of the SciFi assembly hall, highlighted in blue, at CERN LHC Point 8.
The orange path indicates the transport route of the fully assembled and commissioned \cframes{} to the access shaft in order to be lowered into the LHCb cavern.
The street art on the left shows a projection of the LHCb experiment located \qty{100}{\meter} underground.
Satellite image taken and modified from Ref.~\autocite{mapcern}
}
\label{fig:point8}
\end{figure}
The assembly and commissioning of the individual \cframes{} requires a close collaboration of physicists, engineers and technicians from different institutes.
From start to finish, the process for an individual \cframe{} takes about four months and consists of the following steps:
% Steps taken from https://social.cern.ch/me/bleverin/Documents/Shared%20with%20Everyone/SciFi/C-Frames_GPshort.pdf
\begin{itemize}
\setlength\itemsep{0em}
\item Assembly of the C-shaped mechanical support frame.
\item Installation of water cooling pipes and blocks.
\item Low and high voltage cables installation and testing.
\item Installation of Novec and dry gas lines.
\item Laying of optical fibres.
\item Mounting of fibre modules.
\item Connection of cold boxes to dry gas and Novec bellows.
\item Winding of heating wires around the cold boxes.
\item Test of vacuum and Novec cooling system.
\item Mounting of Readout Boxes (ROBs).
\item Connection of power and SiPM flex cables to the ROBs.
\item Inspecting, cleaning, and connecting optical fibres to the ROBs.
\item \textbf{Commissioning of ROBs.}
\end{itemize}
Many of the steps can be performed in parallel on up to four \cframes{}.
This is possible by four separate assembly and commissioning slots inside the SciFi assembly hall as shown in \vref{fig:assembly-hall}.
Each slot, also referred to as C-Cage, is surrounded by a scaffolding that allows work to be conducted on three different levels.
However, the available infrastructure only allows for the commissioning of one \cframe{} at a time.
Due to limited lengths of cables and supply lines it is performed in the two leftmost slots depicted in \cref{fig:assembly-hall}.
\begin{figure}
\centering
\includegraphics[width=1.0\textwidth]{figures/commissioning/Assembly_Hall}
\caption{
Photograph of the four \cframe{} assembly and commissioning slots inside the SciFi assembly hall at night.
}
\label{fig:assembly-hall}
\end{figure}
Six further \cframes{} can be stored in a dedicated storage cage.
Because it is also used for transporting the detector elements to the LHCb cavern (see route in \vref{fig:point8}) it is referred to as transport and storage cage (TS-Cage).
In addition, the TS-Cage allows for flexibly swapping \cframes{} between the different slots.
This is for instance required for moving an assembled \cframe{} to one of the two commissioning slots.
\section{Complexity of Readout System}
\label{sec:complexity}
Due to the complexity of the system, the operation of the SciFi front-end electronics is challenging.
As illustrated in \vref{fig:complexity}, the complete detector consists of many thousand individual components that are installed in a total of \num{256} ROBs.
These include the Master, Cluster and PACIFIC Boards, as well as the various ASICs, FPGAs and other electronic parts that are located on these boards.
\begin{figure}
\centering
\includegraphics[width=1.0\textwidth]{figures/commissioning/Complexity}
\caption{
Illustration of the complexity of the SciFi front-end electronics.
The given values correspond to the number of components installed in the complete detector.
}
\label{fig:complexity}
\end{figure}
The task of the electronics commissioning is the initial operation of this system in conjunction with the surrounding infrastructure and the DAQ system.
As mentioned previously, this is performed at the level of individual \cframes{}, i.e. in units of 1/12 of the complete detector.
Thereby, it must be ensured that all components are functioning properly and, if not, that the necessary calibration and tuning measures are carried out.
In the scope of this thesis, a detailed commissioning procedure has been defined, developed and implemented.
In a series of tests, the correct functioning of the front-end electronics is ensured.
In addition, the electronics properties are validated at the system level while being installed on the \cframes{}.
The complete commissioning procedure is presented in \cref{sec:commissioning-procedure} along with the discussion of the results throughout the course of this chapter.
A key aspect of it is the verification of a stable data transmission over the total of 4096 data links.
Establishing the \qty{40}{\mega\hertz} readout required for that purpose was a major part in the framework of this work.
\section{Front-End Tester}
\label{sec:fe-tester}
Before being mounted on a \cframe{}, the proper functioning of each individual ROB is verified by a dedicated test stand located in an electronics lab on the LHCb site.
The so-called SciFi Front-End Tester (FE-Tester) comprises a signal injection system that was specifically designed for the project.
A FE-Tester with a mounted ROB is shown in the photograph in \vref{fig:fe-tester}, along with a schematic drawing of the complete setup.
The FE-Tester itself consists of one control board and eight injector modules with \num{256} channels each in order to cover the \num{2048} channels of one ROB under testing.
Two of these systems are in operation for the quality control of the SciFi front-end electronics~\autocite{fe-tester}.
\begin{figure}
\captionsetup[subfigure]{aboveskip=1pt, belowskip=12pt} %Tune space between captions and subfigures
\begin{subfigure}{0.49\linewidth}
\centering
\includegraphics[width=1.0\textwidth]{figures/commissioning/FE-Tester}
\end{subfigure}
\hfill
\begin{subfigure}{0.49\linewidth}
\centering
\includegraphics[width=1.0\textwidth]{figures/commissioning/FE-Tester_Drawing}
\end{subfigure}
\caption{
Photograph of the SciFi Front-End Tester with a mounted ROB under testing (left).
A schematic drawing of the complete setup including the DAQ system and power supplies is shown on the right.
Two of these test systems are used to verify the proper functioning of the ROBs before being installed on the \cframes{}.
}
\label{fig:fe-tester}
\end{figure}
The connection to the DAQ server is established via a bidirectional optical link.
Like the \MB{}, the control board features a Master GBTX (see \cref{sec:master-gbtx}) and SCA (\cref{sec:master-sca}) to enable the communication.
It shares the same back-end electronics as the ROB under testing, as well as an extended version of the control software used in the commissioning, which has been contributed as part of this thesis.
Further details about the DAQ system and software are given in the following sections.
The main functionality of the FE-Tester is the injection of calibrated charge pulses into the \num{2048} PACIFIC channels of the examined ROB.
It is thus possible to verify the functioning of each channel by comparing the detected signal amplitude with the expected value, as well as evaluating the noise level.
In addition, the FE-Tester allows for the independent charge injection into selected channels, thereby enabling the detection of short-circuited channels.
The connection to the PACIFIC channels is established via the same 80-pins, \qty{0.5}{\milli\meter} pitch connectors used for connecting the SiPM arrays (see \cref{sec:sipm}).
However, defects of these high density connectors are to be expected after about \num{50} mating cycles.
Therefore, an interconnected adapter board (connector saver) is utilised to allow for the testing of the larger number of ROBs.
Besides from the main test sequence involving the charge injection from the FE-Tester, each ROB is subject to a series of further checks while being mounted on the test stand.
These include the reading of hardware IDs of the individual components like GBTX and SCA ASICs.
By comparing them with the expected values as stored in a database during the assembly of the ROB, it is assured that only intended boards and components are used.
Furthermore, the ROBs are prepared for the following commissioning on the C-Frame by programming the FPGAs with later firmware versions, while also testing the programming functionality itself.
The commissioning preparations also include a (pre-)calibration of the temperature and voltage monitoring circuits.
Further details can be found in \cref{sec:temperatures,sec:hv-calibration}.
After a ROB passed the complete test procedure, it undergoes an optical inspection and cleaning of the VTRx and VTTx modules before being cleared for the commissioning.
\Vref{fig:robs-on-cframe} shows a photograph of six ROBs that have been mounted on a \cframe{} after passing the tests on the FE-Tester.
\begin{figure}
\centering
\includegraphics[width=1.0\textwidth]{figures/commissioning/ROBs_on_Cframe}
\caption{
Photograph of six ROBs mounted on the bottom half of a \cframe{} and ready to be commissioned.
}
\label{fig:robs-on-cframe}
\end{figure}
\section{Commissioning Setup}
\label{sec:commissioning-setup}
As discussed previously, the commissioning of the SciFi front-end electronics is performed directly following the assembly of each \cframe{}.
It is carried out in one of two available commissioning slots in the back part of the SciFi assembly hall, close to the infrastructure needed for that purpose.
Depending on the tracking station, each \cframe{} comprises up to \num{24} ROBs that are commissioned at once\footnote{Some commissioning steps cannot be done in parallel for all ROBs, but are performed in two iterations as discussed in the following sections.}.
All the ROBs have been examined using the two FE-Tester setups prior to being mounted.
The aim of the commissioning is to ensure the functionality of the ROBs on the \cframe{} within their final configuration of connected power cables, optical fibres, LIS and SiPMs.
The connection of the SiPM arrays is an especially delicate operation due to the high density connectors.
In the assembly slots, easy access to the different components is provided in case of detected defects during the commissioning.
Afterwards, the \cframes{} are moved to the TS-Cage and transported as a whole to the LHCb cavern, which has been found to be a gentle process in previous (test) transports.
During the commissioning, the ROBs are already referenced by their location within the final detector.
The location identifier (ID) is composed of the station number $\text{T} = [1,2,3]$, layer within the station $\text{L} = [0,1,2,3]$, quadrant within the layer $\text{Q} = [0,1,2,3]$, and module within the quadrant $\text{M} = [0,1,...,5]$.
Note that the module number only ranges from 0 to 4 for T1 and T2.
The naming convention, as well as the position of an exemplary ROB with location ID T3L2Q3M2, is illustrated in \vref{fig:naming}.
During the commissioning, the location ID of the ROB is often appended with additional information to further localise a component or a problem.
As outlined in \cref{sec:c-frames}, each \cframe{} comprises two layers with two quadrants each.
Thus, it is common to also refer to the \cframes{} in terms of T, L and Q (e.g. T3L23Q13).
\begin{figure}
\centering
\includegraphics[width=1.0\textwidth]{figures/commissioning/SciFi_Naming}
\caption{
Schematic view of the \scifitracker{} with the naming convention overlaid.
An example ROB and its location identifier is highlighted in red.
Distances between the layers (L) and stations (T) are exaggerated for better visibility.
The inner layers per station (L1 \& L2) are tilted by \qty{\pm 5}{\degree} to obtain the $y$-coordinate of detected hits.
}
\label{fig:naming}
\end{figure}
\subsection{Infrastructure}
\label{sec:commissioning-infrastructure}
The infrastructure used for the commissioning represents a subset of the final system.
It consists of a MARATON power supply that delivers the low voltage for the front-end boards.
With 12 output channels set to \qty{8}{\volt}, two ROBs are powered by one channel.
The SiPM channels are biased by three CAEN modules of type A1539BP inside a SY4527 crate (see \cref{sec:hv}).
Each of the 96 available HV channels power four neighbouring SiPM arrays corresponding to one fibre mat.
The front-end electronics are cooled by a portable water chiller that provides a water flow of \qty{2.5}{\liter/\minute} at \qty{20}{\celsius} to each quadrant.
A Novec cooling plant is located next to the LV and HV power supplies on the mezzanine floor in the back part of the assembly hall.
During the electronics commissioning of the first \cframes{}, the SiPMs were cooled to \qty{-40}{\celsius}.
However, due to time constraints, the vacuum and Novec cooling tests were separated and performed prior to the electronics commissioning for the later \cframes{}.
A floor plan of the assembly hall including the commissioning setup and infrastructure is shown in \vref{fig:floorplan}.
\begin{figure}
\centering
\includegraphics[width=0.9\textwidth]{figures/commissioning/Assembly_Hall_Floorplan}
\caption{
Floor plan of the SciFi assembly hall.
}
\label{fig:floorplan}
\end{figure}
\subsection{DAQ System}
\label{sec:commissioning-daq}
The DAQ system used for readout and control of the front-end electronics is located next to the C-Cages.
It comprises a 4U dual processor server\footnote{SuperServer 4029GP-TRT by Super Micro Computer, Inc.} equipped with two \mbox{10-core} Intel Xeon Silver 4114 processors.
The main components are nine PCI Express cards that act as the interface to the front-end electronics.
Since they are specially designed for the trigger-less readout of the LHCb experiment at \qty{40}{\mega\hertz}, they are referred to as PCIe40 cards~\autocite{pcie40}.
The PCIe40 card embeds a powerful FPGA\footnote{Arria 10 10AX115 by Altera (now Intel Corporation)} with \num{1.15} million logic cells.
Its purpose is to enable the flexible implementation of the GBT architecture at the back-end side.
By programming the FPGA with different firmwares, it can fulfil three different roles:
\begin{itemize}
\item The \textbf{TELL40} cards receive the data from the front-end electronics and are responsible for merging events belonging to a single bunch crossing.
\item The \textbf{SOL40s} are the direct interfaces for the communication and control of the front-end boards.
\item The readout supervisor \textbf{SODIN} is responsible for distributing the \qty{40}{\mega\hertz} LHC clock and fast commands to the other cards.
\end{itemize}
Each card supports up to \num{48} bidirectional optical links by means of 4+4 12-channel fibre optics transmitter and receiver modules, called MiniPODs\footnote{MiniPOD AFBR-811FN3Z (Tx) and AFBR-821FN3Z (Rx) by Avago Technologies (now Broadcom Inc.)}.
Eight MPO (Multi-fiber Push On) connectors are located on the board to allow for the connection of the optical fibres going to the front-end electronics.
\Vref{fig:daq} illustrates the interplay between the SODIN, SOL40 and TELL40 at the back-end, as well as the front-end electronics.
For the SciFi commissioning, eight PCIe40 cards are configured as TELL40s while the remaining one is programmed with a combined SODIN/SOL40 firmware.
The latter distributes the clock and fast commands to the TELL40 cards via SFP+ (enhanced small form-factor pluggable) transceivers and a PON (passive optical network) splitter.
\begin{figure}
\centering
\includegraphics[width=0.85\textwidth]{figures/commissioning/DAQ}
\caption{
Interplay of the back- and front-end electronics in the upgraded LHCb DAQ system as used for the commissioning.
During the commissioning, the functionality of the SODIN and SOL40 is combined within a single FPGA.
}
\label{fig:daq}
\end{figure}
Depending on the tracking station, each \cframe{} comprises up to \num{384} data and \num{48} control links.
Thereby, the available PCIe40 cards are just enough for the commissioning of one \cframe{} at a time.
However, at the time of the commissioning, no TELL40 firmware was available that allowed for the simultaneous operation of all \num{48} links.
Therefore, two separate TELL40 firmwares are in use, each with \num{24} instantiated data links.
The active links are logically split into the lower (quadrants Q0/Q1) and upper (Q2/Q3) half of the \cframe{}.
As a result, the commissioning steps that require the data taking capabilities need to be performed separately for both halves.
Conversely, all ROBs on the \cframe{} can be controlled with a single SODIN/SOL40 firmware.
In general, the firmwares for the PCIe40 cards were still experimental and under heavy development especially in the early phase of the commissioning.
Therefore, the debugging of the DAQ system was conducted in parallel and several attempts were required in order to obtain a (mostly) stable system.
Once a set of functioning firmwares were found, their versions were kept and used throughout the complete commissioning campaign.
As a consequence, not the full feature set was available and some workarounds were required in order to achieve the desired results.
For compatibility reasons, this also includes the firmwares of the Housekeeping and Cluster FPGAs on the front-end electronics itself.
\subsection{Control Software}
The software to control the SciFi front- and back-end electronics has been developed within the SCADA (Supervisory Control and Data Acquisition) system SIMATIC \winccoa{}~\autocite{wincc-oa}.
A \winccoa{} system, also referred to as project, follows a distributed architecture consisting of multiple autonomous processes that are called managers.
These can be control managers that are executing program code written in the \winccoa{} CTRL language whose syntax shows strong similarities with that of the C programming language.
In addition to that, an API (Application Programming Interface) exists that allows to execute external \CC{} programs in the context of the project.
Other examples are user interface managers that are responsible for the visual representation of control panels and the handling of user input.
Screenshots of \winccoa{} control panels can be found in the following sections.
\begin{figure}
\centering
\includegraphics[width=0.7\textwidth]{figures/commissioning/WinCC-OA}
\caption{
Structure of a \winccoa{} project.
Image taken from Ref.~\autocite{wincc-clara}.
}
\label{fig:wincc-oa}
\end{figure}
The typical structure of a \winccoa{} project is depicted in \vref{fig:wincc-oa}.
While multiple control (Ctrl) and user interface managers (UIM) can be present, a single event manager (EVM) forms the core of the project.
Via an intermediate database manager (DBM), it has access to the project's database (DB).
All communication between the various processes, including the drivers (D) that establish the connection to the physical devices, goes through the event manager and the interaction with the database.
The objects stored in the database are called datapoints and can follow different structures.
These are defined by the datapoint types that can be thought of as (nested) structs within the C programming language and its derivatives.
Typically, physical devices like the \MB{} with its components are represented as datapoint types.
Besides from the inherent distributed architecture, it is even possible to connect different \winccoa{} projects with each other.
This is enabled by the distribution managers in each project that allow for the combination of hundreds of systems.
Due to its good scalability, \winccoa{} is used for controlling all LHC experiments at CERN~\autocite{lhcb-scada}.
However, \winccoa{} can not be understood as a ready-to-use control system, but it rather provides the underlying tools to built such a system depending on the individual needs.
\subsubsection{JCOP Framework}
For that reason, a joint controls project (JCOP) was initiated by the four LHC experiments~\autocite{jcop}.
Due to similar requirements, it aims at avoiding the duplication of work during the development of new control systems.
In addition to including a set of packages with common control solutions, it also provides a framework that allows for the creation and distribution of further JCOP components.
Two of these additional packages are the fwGbt and fwHw components~\autocite{lhcb-scada}.
The fwGbt component integrates the GBT architecture into \winccoa{} by providing an interface to the various protocols that the GBT chipset supports (see \cref{sec:cluster-sca,sec:master-gbtx}).
It is used in close conjunction with the GbtServ that performs the low-level communication with the PCIe40 cards and their firmwares.
As such, it forms the bridge between the control system and the hardware level.
The fwHw component is designed to model the structure of the front-end electronics in a user-friendly way by hiding the complexity of the underlying \winccoa{} datapoint structures.
Combined, the fwGbt and fwHw packages allow for the interaction with the different hardware devices without the need to specify parameters like the protocol or address, as these are already defined in the model.
On a macroscopic level, the devices are organised in a hierarchical structure.
The tree-like structure consists of interconnected nodes, where each node contains one or multiple children.
On the lowest level, representing the leafs of the tree, there are the physical devices.
Conversely, the nodes describe logical accumulations of the actual hardware.
\subsubsection{Finite State Machine (FSM)}
The creation of the hierarchical structure is enabled by the fwFSM component provided in the JCOP framework~\autocite{jcop}.
Within the implementation of the package, a node is referred to as control unit (CU) and a leaf as device unit (DU).
The naming is inspired by the term FSM (Finite State Machine), since each unit (CU or DU) is assigned a fixed number of states.
Transitions between the different states are possible by issuing a command to the unit, while the set of available commands depends on the current state.
Since a CU is not a physical device in itself but a logical accumulation, its state depends on the combined states of its children.
As a result, when any child is in an erroneous state, the parent CU is in an erroneous state as well.
In turn, issuing a command to a CU results in it being forward to its child units.
Summarising the above, it can be said that state changes are propagated up the hierarchy (from leaves to the root of the tree) while commands are moving down the hierarchy (root to leaves).
For the operator, this allows for a clear overview when controlling and monitoring the detector.
\begin{figure}
\centering
\includegraphics[width=1.0\textwidth]{figures/commissioning/FSM_Tree}
\caption{
Hierarchical structure the SciFi front-end electronics (FEE) as implemented in the control software.
It is usually referred to as FSM tree.
Image inspired from Ref.~\autocite{scifi-controls}.
}
\label{fig:fsm-tree}
\end{figure}
The so-called FSM tree of the SciFi front-end electronics as implemented in the control software is displayed in \vref{fig:fsm-tree}.
As can be seen, the hierarchy is strongly oriented towards the naming convention of the ROBs introduced in \cref{sec:commissioning-setup}.
Hence, the tracking stations, layers, quadrants and modules each form a new layer of CUs.
On the lowest level are the \HalfROBs{} that represent the DUs in the tree.
Since each \HalfROB{} is controlled by a separate (bidirectional) control link, it is the logical choice for the smallest unit within the hierarchy.
\Vref{fig:den} shows a screenshot of the Device Editor Navigator (DEN) that is used in \winccoa{} to create the tree-like structure.
\begin{figure}
\centering
\begin{minipage}{0.48\textwidth}
\centering
\includegraphics[width=1.0\linewidth]{figures/commissioning/DEN}
\caption{Screenshot of the device manager used to create and control the hierarchical structure of the SciFi front-end electronics.}
\label{fig:den}
\end{minipage}
\hfill
\begin{minipage}{0.48\textwidth}
\centering
\includegraphics[width=1.0\linewidth]{figures/commissioning/FSM}
\caption{
Finite State Machine (FSM) inside the device unit (DU) of the SciFi front-end electronics project.
Each DU is representing one \HalfROB{}.
}
\label{fig:fsm}
\end{minipage}
\end{figure}
The possible states and transitions within the FSM of the \HalfROBs{} are illustrated in \vref{fig:fsm}.
On start-up, issuing a Reset command brings the DU into the NOT\_READY state in which all the components on the front-end boards are powered.
However, a subsequent Configure command is required to get the \HalfROB{} READY for taking data.
During a data taking run, the DUs are in the RUNNING state.
Any failures during the execution of the different commands result in the transition to ERROR from any of these states.
Further details about the different states and commands are given in the following sections.
Based on the aforementioned building blocks, the control system for the \cframe{} commissioning has been implemented.
Two \winccoa{} projects are running on the DAQ server itself and are responsible for the control of the SciFi front- and back-end electronics.
Additional projects are installed on supplementary machines in order to regulate the devices providing the infrastructure (see \cref{sec:commissioning-infrastructure}).
The different projects are integrated into the LHCb network and connected to each other via their distribution managers.
Although the system has been developed with the \cframe{} commissioning in mind, it also forms the basis for the control system used for the final operation of the detector in the LHCb cavern.
\subsection{Commissioning Procedure}
\label{sec:commissioning-procedure}
The procedure of the SciFi front-end electronics commissioning consists of a number of successive steps.
In order to ensure the correct functioning of the ROBs, the test sequence is split into three stages as displayed in \vref{fig:commissioning-procedure}.
\begin{figure}
\centering
\includegraphics[width=1.0\textwidth]{figures/commissioning/Procedure}
\caption{
Test sequence during the SciFi front-end electronics commissioning.
}
\label{fig:commissioning-procedure}
\end{figure}
In the first stage, the electronics are tested for their basic functionality.
This includes, among other things, checking the communication to the Master GBTX ASICs via the bidirectional optical links, as well as testing the internal communication protocols and registers while configuring the various components within each ROB.
The second stage is about the examination of the readings of the temperature and voltage sensors located on the front-end boards.
In case of the HV biasing the SiPM arrays, this includes a recalibration of the monitoring circuit parameters.
The third and last stage consists of extensive tests that involve the data taking functionalities through the complete data chain.
These range from injecting test patterns to recording SiPM signals generated by the LIS (see \cref{sec:lis}).
Given the limitation of the available DAQ system and TELL40 firmwares that only allow to readout half of the data links (see \cref{sec:commissioning-daq}), these steps need to be performed twice for each \cframe{}.
The complete procedure takes about two working weeks per \cframe{}, or longer if defects are found that require a rework or even a replacement of ROBs.
However, due to the initial development of tools and the varying situations within the \mbox{COVID-19} pandemic, the commissioning of the first four \cframes{} took significantly more time.
Within the scope of this thesis, the test procedure was developed and implemented in software.
After overcoming initial challenges and performing the measurements on the first \cframes{}, the responsibilities for the further conduction of the different test steps were gradually transferred to other members of the SciFi collaboration.
A list of all \cframes{} and the time periods in which they were commissioned is shown in \vref{tab:commissioning}.
They are numbered on both sides of the beam pipe from station T3 to T1.
Depending on the state of the assembly, the order of commissioning roughly followed this numbering.
The six \cframes{} facing the cryogenic facility (C-side) were commissioned at first in order to be installed in the LHCb cavern prior to the beam pipe.
Afterwards, the six \cframes{} towards the access point (A-side) followed.
The different test steps are fully described in the following sections, along with the results of the 9 out of 12 \cframes{} that were fully commissioned at the time of writing.
\begin{table}
\centering
\begin{threeparttable}
\caption{Time periods in which the commissioning of the front-end electronics for the 12 \cframes{} forming the \scifitracker{} took place.}
\label{tab:commissioning}
\begin{tabular}{@{}llllll@{}}
\toprule
% Header
\multirow{2}{*}{\bfseries C-Frame} &
\multirow{2}{*}{\bfseries Side} &
\multirow{2}{*}{\bfseries Location ID} &
\multirow{2}{*}{\bfseries ROBs} &
\multicolumn{2}{c}{\bfseries Commissioning}\\ &&&& Started & Completed \\ \midrule
%---------
C-Frame 1 & C & T3L23Q02 & 24 & 25.11.2019 & 24.02.2020 \\
C-Frame 2 & C & T3L01Q02 & 24 & 06.11.2020 & 10.02.2021 \\
C-Frame 3 & C & T2L23Q02 & 20 & 28.02.2020 & 21.07.2020 \\
C-Frame 4 & C & T2L01Q02 & 20 & 04.08.2020 & 02.11.2020 \\
C-Frame 5 & C & T1L23Q02 & 20 & 28.05.2021 & 08.06.2021 \\
C-Frame 6 & C & T1L01Q02 & 20 & 16.06.2021 & 01.07.2021 \\
C-Frame 7 & A & T3L23Q13 & 24 & 13.10.2021 & 28.10.2021 \\
C-Frame 8 & A & T3L01Q13 & 24 & 23.09.2021 & 06.10.2021 \\
C-Frame 9 & A & T2L23Q13 & 20 & 03.11.2021 & 18.11.2021 \\
C-Frame 10 & A & T2L01Q13 & 20 & \multicolumn{2}{c}{Q1 2022*} \\
C-Frame 11 & A & T1L23Q13 & 20 & \multicolumn{2}{c}{Q1 2022*} \\
C-Frame 12 & A & T1L01Q13 & 20 & \multicolumn{2}{c}{Q1 2022*} \\ \bottomrule
\end{tabular}
\begin{tablenotes}[flushleft]
\small
\item * Expected
\end{tablenotes}
\end{threeparttable}
\end{table}
\section{Communication to Master GBTXs}
\label{sec:master-gbtx-communication}
The first step after powering up the ROBs by the MARATON power supply is to check the communication between the front- and back-end electronics via the bidirectional optical links.
As outlined in \cref{sec:master-gbtx}, the Master GBTX recovers the clock from the incoming data stream and replies in turn with GBT frames consisting of 120 bits each.
As depicted in \vref{fig:gbt_dataformat}, a 4-bit header marks the beginning of each frame and is required for synchronisation on the receiving side.
Once the header is recognised \num{15} times in a row, the data stream is considered as locked.
Conversely, missing the header more than \num{4} times without detecting \num{8} consecutive valid headers in-between results in a loss of the acquired frame-lock.
%== Values read from a Master GBTX on the C-Frame ==
%rxValidHeaders=15
%rxMaxInvalidHeaders=4
%rxMinValidHeaders=8
For each link of the SOL40 card, there is a register that contains the counted number of clock cycles without frame-lock.
When stable connections to the Master GBTX ASICs exists, after powering up the ROBs and resetting the counters, the values should stay at zero.
For that purpose, the counters of the 48 SOL40 links are monitored in a dedicated \winccoa{} control panel.
During the commissioning of the first 9 \cframes{}, corresponding to \num{392} \HalfROBs{}, it occurred only once that the connection to a Master GBTX could not be established.
As a consequence, the affected ROB T3L3Q3M4 on \cframe{} 7 was replaced with a spare in order to get repaired.
\section{Configurability of ROBs}
\begin{figure}
\centering
\includegraphics[width=1.0\textwidth]{figures/commissioning/HalfROB_Panel}
\caption{
\winccoa{} control panel showing the readout values of the main registers of one \HalfROB{}.
Intended for the use by experts for debugging purposes.
}
\label{fig:halfrob-panel}
\end{figure}
After verifying the connection to the Master GBTXs, the next step is to test the internal communication protocols and control registers of the various components inside the ROBs.
Theses tests are carried out in the process of fully configuring the boards such that they are ready for data taking.
As depicted in \vref{fig:fsm}, after powering the \HalfROBs{}, getting them to the READY state requires a Reset followed by a Configure command.
The constituent actions of these two FSM commands are presented in the following.
\subsubsection{Reset Command}
The Reset command can be considered as the minimal set of actions that are required to provide power to all components of the ROB.
As shown in \cref{fig:fsm}, it can be issued from any state and is used to put the electronics in a well-defined state.
The complete procedure consists of the following consecutive steps:
\begin{enumerate}
%==Master Board==
\item[] \textbf{Reset \MB{} (MB)}
\begin{enumerate}[label=\arabic*.]
%\item Check Master GBTX lock
\item Configure the Master GBTX (366 read/write control registers) to set up the E-Links and the Data GBTX reference clocks (see \cref{sec:master-gbtx})
\item Reset the Master SCA, enable the required control buses and configure directions of the GPIO pins (see \cref{sec:cluster-sca,sec:master-sca})
\item Provide power to all components by successively enabling the three \DCDC{} converter sections through the corresponding GPIO output lines (see \cref{sec:master-sca})
\begin{enumerate}[label=\alph*)]
\item Left half of the MB including CB0, CB1 and PB0, PB1, as well as the HK FPGA
\item Right half of the MB including CB2, CB3 and PB2, PB3
\item Cluster FPGAs on all four CBs
\end{enumerate}
\item Reset and configure the Data GBTXs (366 read/write control registers) to provide the external clocks to the Cluster FPGAs and PACIFICs (see \cref{sec:data-gbtx})
\end{enumerate}
%==Cluster Boards==
\item[] \textbf{Reset \CB{}s (CBs)}
\begin{enumerate}[label=\arabic*.]
\item Reset the Cluster SCAs, enable the required control buses and configure directions of the GPIO pins (see \cref{sec:cluster-sca})
\item Perform device resets of the Cluster FPGAs through the corresponding GPIO output lines (see \cref{sec:cluster-sca,sec:cluster-fpga})
\end{enumerate}
%==PACIFIC Boards==
\item[] \textbf{Reset \PB{}s (PBs)}
\begin{enumerate}[label=\arabic*.]
\item Reset the four PACIFICs on each PB (see \cref{sec:pacific_regs})
\begin{enumerate}[label=\alph*)]
\item Perform a global digital reset (GPIO output signal: RESET)
\item Load the initial register values (GPIO output signal: LDINIT)
\end{enumerate}
\end{enumerate}
\end{enumerate}
Each operation is verified by reading back the values from the corresponding control registers.
Only if they are matching the previously written values, the procedure continues with the next step.
In case of a mismatch, an error message is raised and the \HalfROB{} DU changes to the ERROR state.
Otherwise, once all steps have been completed successfully, the FSM state of the DU switches to NOT\_READY.
Due to the implementation of the E-Links, all four Cluster and PACIFIC Boards can be handled in parallel.
The control software has been developed with that functionality in mind and is addressing all boards simultaneously wherever possible.
Similarly, on the level of the FSM tree, issuing a command to a CU will be forwarded and processed concurrently by the DUs below.
In principle, this allows to reset all \HalfROBs{} on a \cframe{} in the same time as an individual one.
\begin{figure}
\centering
\includegraphics[width=1.0\textwidth]{figures/commissioning/Reset_Procedure}
\caption{
Time course of the input voltage and current consumption of one \HalfROB{} during the processing of the Reset command as measured with an oscilloscope.
The slight changes in the input voltage are due to voltage drops in the power cable itself.
}
\label{fig:reset-procedure}
\end{figure}
\Vref{fig:reset-procedure} shows the change of the current consumption and input voltage of one \HalfROB{} during the Reset procedure as measured with an oscilloscope, starting from powering the left half.
As can be seen, the complete process takes in the order of a few seconds.
The largest increases in the power consumption are related to enabling the three \DCDC{} converter sections in the beginning, as well as the configuration of the Data GBTXs.
The measurement was performed within the commissioning setup as described in \cref{sec:commissioning-setup}.
Hence, the probed \HalfROB{} was powered by a MARATON channel operating at \qty{8}{\volt}.
The slight reduction in the input voltage results from increasing voltage drops in the \qty{30}{\meter} LV cable going from the power supply to the \cframe{}.
During the commissioning of the first \cframe{}, stability issues were encountered that even led to communication losses to the \HalfROBs{} in some cases when issuing the Reset command.
It was traced back to be caused by drops of the input voltage that could fall below the level of \qty{6.5}{\volt}, which is required for the operation of the FEASTMP \DCDC{} converters.
It was found that an additional capacitance is required to stabilise the input voltage and also to suppress ringing.
Therefore, specially designed \qty{10}{\milli\farad} capacitor boards consisting of multiple radiation hard ceramic capacitors were produced~\autocite{capacitor-boards}.
They are connected to each LV splitter on the \cframes{} that distribute the power from one MARATON channel to four neighbouring \HalfROBs{}.
Like this, the voltage drops during the Reset procedure are reduced and the input voltage is kept above the limit of \qty{6.5}{\volt} as shown in \cref{fig:reset-procedure}.
\subsubsection{Configure Command}
As depicted in \cref{fig:fsm}, the Configure command can be issued to DUs within the NOT\_READY state in which they are found after a successful Reset.
The purpose of it is to prepare the front-end electronics with the proper configuration parameters needed for taking high quality data.
In contrast to the Reset command that performs the same actions and writes the identical values to the control registers of all \HalfROBs{}, settings individual for each \HalfROB{} are loaded during the Configure command.
The values are determined during the production QA tests of the individual components and saved in the SciFi production database (see \cref{sec:qa}).
This applies to the parameters $I_\text{bias}$ and $I_\text{mod}$ of the LIS, as well as the \PB{}s.
After the QA procedure, a configuration file is uploaded to the database for every \PB{}.
Each file contains the operational values of the \num{336} read/write control registers of each of the four assembled PACIFIC ASICs.
At this stage, the reference voltage parameters $V_\text{REF}$ and $V_\text{refDCFB}$, as well as the trim DACs are specific to each chip, while the remaining values are common to all.
During the Configure command, the configurations are parsed from the files and applied to the devices.
Besides from the control register values, a \PB{} configuration file also contains the serial number of the board.
Before applying the configuration, the serial number is read from the board and compared to the value from the file.
This is particularly important in the commissioning as it allows to verify that the optical control links are connected to the intended \HalfROBs{}.
To confirm the mapping of the data links, an additional step is required that includes the start of a data taking run.
In preparation for this, the 10-bit location ID of the \HalfROB{} is written to a register of the HK FPGA during the configuration.
Further details are given in \cref{sec:fibre-mapping}.
In the last step of the Configure command, the Master and Cluster SCAs are prepared for the monitoring of the temperatures and voltages.
This includes the activation of the ADCs, as well as the switchable \qty{100}{\micro\ampere} current sources required for the operation of resistance thermometers.
Completing all steps of the Configure command successfully results in the \HalfROB{} to change to the READY state, otherwise it goes to ERROR.
During the commissioning of the first 9 \cframes{}, the configuration of a single \HalfROB{} failed because the communication to the LIS GBLDs could not be established.
The issue could only be solved by replacing the affected ROB T2L3Q3M0 on \cframe{} 9 with a spare.
\section{Optical Fibre Mapping}
\label{sec:fibre-mapping}
While the correct cabling of the control links is confirmed in the previous commissioning step when configuring the \HalfROBs{}, further actions are required to verify the mapping of the data links.
The process is based on sending a unique identifier (ID) on each link to the TELL40 cards.
The structure of the so-called (data) link ID consisting of 13 bits is illustrated in \vref{fig:link-id}.
It comprises the 10-bit \HalfROB{} location ID followed by a 3-bit identifier to describe the data link within the \HalfROB{}.
\begin{figure}
\centering
\includegraphics[width=0.8\textwidth]{figures/commissioning/LinkID}
\caption{
Structure of the 13-bit link ID used to uniquely identify the \num{4096} data links of the \scifitracker{}.
}
\label{fig:link-id}
\end{figure}
As shown in \vref{fig:fee-datapath}, there are eight data links per \HalfROB{}, each one corresponding to one Data GBTX and Cluster FPGA.
The 13-bit link ID is constructed in the Cluster FPGA based on the input from 13 individual pins.
The 10-bit location ID of the \HalfROB{} is routed from the HK FPGA to all Cluster FPGAs.
Its value is determined by the content of the HK FPGA instruction register that is assigned with the correct ID during the configuration of the \HalfROB{}.
The remaining 3 bits are composed as follows:
Each FMC connector between the Master and Cluster Boards has two dedicated pins that are connected individually to ground or \qty{1.5}{\volt} on the \MB{}, thereby encoding the location of the \CB{} within the \HalfROB{}.
As already introduced in \cref{sec:cluster-fpga}, the same principle is applied on the \CB{}s themselves to determine the position of the Cluster FPGA on the board.
In order to transfer the information to the TELL40s, a data taking run has to be initiated.
However, in the general data format shown in \vref{tab:dataformat-ff}, there is not enough bandwidth available for the additional transmission of the 13-bit link ID.
Instead, the information is embedded into special GBT frames that are leading every run.
The original purpose of these so-called SYNC\footnote{Not to be confused with the SYNC pulse that is sent to the PACIFICs for synchronisation. Accidentally, the same naming has been used even though they are not directly related.} frames is the re-synchronisation of the bunch crossing IDs in the TELL40s~\autocite{fe-dataformat}.
In addition, it is required for determining the position of the first header in case of the FV data format (see \cref{sec:dataformat}).
The generation of the SYNC frames by the Cluster FPGAs is triggered by sending the SYNC fast command to the \HalfROBs{}.
The content of each SYNC frame is the 12-bit bunch crossing ID followed by a 12-bit programmable SYNC pattern that is set to 0xB4C.
The 13 least significant bits (LSBs) are occupied with the link ID.
In order to allow the TELL40s to recognise the SYNC pattern, several consecutive SYNC frames are generated by issuing multiple (typically ten) SYNC commands in a row.
By default, this process is performed at the start of every data taking run.
Within the TELL40 firmware, the link ID is captured from the SYNC frames and stored in a dedicated register that can be read out by the control system.
By reading out these registers for every TELL40 link after initiating a short run, the Cluster FPGAs on the other end of the optical fibres can be determined and compared to expectations.
Like this, the correct mapping of the optical data links could be confirmed on the first 9 commissioned \cframes{}.
However, one ROB on \cframe{} 8 needed to be replaced due to an issue encountered in this step.
Within T3L1Q1M2H1, the two neighbouring data links 6 and 7 were outputting incorrect link IDs.
The issue was traced back to a faulty connection between the corresponding \CB{} 3 and the \MB{} which caused the transmitted layer (L) bit from the HK FPGA to be stuck at 0.
%Location ID (from Wilco):
%The three bits are hardwired (Connected to ground or the power supply1v5). Each connector from MB to CB has two bits, hardwired on the MasterBoard. Location ID(2-1)= 00->CB0, 01->CB1 etc. on the CB each of the FPGA has one bit Hardwired LocationID(0) 0 for FPGA0 and 1 for FPGA1. The CB FPGA has a 13 bits input bus: LocationID, the remaining 10 bits are routed from Housekeeping FPGA to all CB FPGAs
\section{Optical Power}
\label{sec:optical-power}
\begin{figure}
\centering
\includegraphics[width=1.0\textwidth]{figures/commissioning/Fibre_Layout}
\caption{
Fibre system layout for the readout and control of the front-end electronics of the upgraded \lhcbexperiment{}.
The complete system will contain about \num{15000} individual active fibres~\autocite{rainer-links}.
With \num{4096} data and \numproduct{2 x 512} control fibres, the \scifitracker{} contributes more than \qty{30}{\percent} to that.
Image adapted from Ref.~\autocite{poster-rainer}.
}
\label{fig:fibre-system-layout}
\end{figure}
After verifying the correct fibre mapping in the previous commissioning step, a measurement of the optical power is performed.
Even though every individual fibre as well as the VTRx and VTTx modules are cleaned and optically inspected prior to connecting, it has proven to be a useful tool to cross-check for good optical links.
The latter are particularly important when operating the detector in the LHCb cavern.
As shown in \vref{fig:fibre-system-layout}, there are several hundred meters of optical fibres and a minimum of 3 breakpoints between the front-end electronics in the cavern and the readout boards in the data center on the surface of the LHCb site~\autocite{poster-rainer}.
In contrast, the fibre system layout within the commissioning setup only consists of 1 breakpoint (2 for the control links) and $\qty{15}{\meter}+\qty{60}{\meter}$ of optical fibres.
\begin{figure}
\begin{subfigure}{1.0\linewidth}
\centering
\includegraphics[width=1.0\textwidth]{figures/commissioning/Optical_Power_Heatmap_C7}
\caption{Control links of \cframe{} 7}
\label{fig:optical-power-heatmap}
\end{subfigure}\\[1ex]
\begin{subfigure}{0.49\linewidth}
\centering
\includegraphics[width=1.0\textwidth]{figures/commissioning/Optical_Power_Hist_C7}
\caption{\cframe{} 7}
\label{fig:optical-power-c7}
\end{subfigure}
\hfill
\begin{subfigure}{0.49\linewidth}
\centering
\includegraphics[width=1.0\textwidth]{figures/commissioning/Optical_Power_Hist_All}
\caption{Accumulated: \cframes{} 1 to 9}
\label{fig:optical-power-acc}
\end{subfigure}
\caption{
Results of the optical power measurements as presented in the commissioning report of \cframe{} 7: Heatmap of the control links (a) and histogram of all links (b).
In addition, an accumulated histogram for the first 9 commissioned \cframes{} is shown (c).
}
\label{fig:optical-power}
\end{figure}
Within the commissioning, the received power from the optical transmitters of the ROBs are measured directly with the MiniPODs that are installed on the PCIe40 cards.
For that purpose, the MiniPOD receiver modules feature a Two Wire Serial (TWS) interface to monitor the optical input power, along with other diagnostic information like module temperatures and supply voltages~\autocite{minipods-datasheet}.
However, no official specifications regarding the tolerances of these quantities were available.
Therefore, a cross-check was performed by remeasuring the optical powers of \num{96} links with a calibrated handheld power meter\footnote{MultiFiber\textsuperscript{TM} Pro by Fluke Networks}.
Both sides of the fibre connections were thoroughly cleaned between measurements.
By doing so, differences up to \qty{10}{\percent} between the optical power measurements of the MiniPODs and the handheld power meter were found.
This is in the same order as the specified tolerance of \qty{\pm 0.35}{\decibel} (corresponding to about \qty{8}{\percent}) of the power meter itself~\autocite{power-meter}.
Therefore, it was concluded that the accuracy of the MiniPODs is sufficient for the given application.
For each commissioned \cframe{}, an automated report on the optical power readings of the MiniPODs is generated.
It includes a histogram of the measured values, as well as heatmaps of the individual control and data fibres.
The latter allow for the identification of links that lie outside the acceptable range and require further actions.
An exemplary histogram and heatmap from the optical power commissioning report of \cframe{} 7 are displayed in \cref{fig:optical-power-heatmap,fig:optical-power-c7}.
%In addition, it shows the accumulated results from the first 9 commissioned \cframes{}.
The optical power must not only be strong enough for a stable transmission within the commissioning setup, but also for the final fibre system layout as shown in \cref{fig:fibre-system-layout}.
Therefore, a limit of \qty{224}{\micro\watt} (\qty{-6.5}{\dBm}) is applied to all links.
The unit decibel-milliwatt (dBm) states the optical power in decibels (dB) relative to \qty{1}{\milli\watt}:
\begin{equation}
P\,[\unit{\dBm}] = 10 \cdot \text{log}_{10}\left( P\,[\unit{\milli\watt}] / \qty{1}{\milli\watt} \right)\,.
\end{equation}
It is often used because of the convenience to incorporate losses (in dB) through simple addition.
The limit is derived based on a previous study that evaluated the Versatile Link power budget in the context of the \lhcbexperiment{} assuming \qty{400}{\meter} of optical fibres and three breakpoints~\autocite{versatile-link-application-note}.
The relevant values from that study in the upstream direction, i.e. the transmission from the front- to the back-end electronics, are summarised in \vref{tab:power-budget}.
The resulting minimal transmission power of \qty{-6.5}{\dBm} (\qty{224}{\micro\watt}) is taken as the limit for the commissioning.
Note that this assumes a lossless transmission within the commissioning setup, thus providing an additional safety margin in the order of \qtyrange[range-phrase=--,range-units=single]{1}{2}{\decibel}.
\begin{table}
\centering
\caption{
Optical power budget of the final fibre system layout of the \lhcbexperiment{} in the upstream direction.
Values taken from Ref.~\autocite{versatile-link-application-note}.
}
\label{tab:power-budget}
\begin{tabular}{@{}llS@{}}
\toprule
Description & Unit & {Power (loss)} \\ \midrule
\multirow{2}{*}{Max. Rx sensitivity} & \unit{\micro\watt} & 78 \\
& \unit{\dBm} & -11.1 \\ \hline
Fibre attenuation & \unit{\decibel} & 0.95 \\
Insertion loss & \unit{\decibel} & 2.25 \\
Mode dispersion & \unit{\decibel} & 1.3 \\
Fibre radiation & \unit{\decibel} & 0.1 \\ \hline
\multirow{2}{*}{Min. Tx power} & \unit{\dBm} & -6.5 \\
& \unit{\micro\watt} & 224 \\ \bottomrule
\end{tabular}
\end{table}
In practice, a soft limit of \qty{300}{\micro\watt} is used during the commissioning.
For links below that value, an additional cleaning iteration is carried out.
In the process, the transmitter modules as well as the fibre ends are optically inspected and, if required, cleaned again.
Despite the additional effort, 9 control and 10 data links still had an optical power reading below \qty{300}{\micro\watt} within the commissioning of the first 9 \cframes{}, as shown in \cref{fig:optical-power-acc}.
Measured against a total of \num{3528} upstream links, this corresponds to a share of \qty{0.5}{\percent}.
However, the optical power was still above the critical limit of \qty{224}{\micro\watt} in all cases and the components were therefore released for operation.
\section{Temperature Sensors}
\label{sec:temperatures}
As introduced in \cref{ch:fee}, the SciFi front-end electronics are equipped with various resistance thermometers that enable the temperature monitoring of the boards and components.
An overview of the temperature sensors in use is given in \vref{tab:temp-sensors}.
Ensuring the correct functioning of these sensors is the task of this commissioning step.
Moreover, monitoring the temperatures of the water cooled electronics has proven to be a useful tool to detect an incorrect mounting of the ROBs on the \cframe{}.
\begin{table}
\centering
\caption{Temperature sensors in use on the SciFi front-end electronics.}
\label{tab:temp-sensors}
\begin{tabular}{@{}llrrcc@{}}
\toprule
\multirow{2}{*}{\bfseries Part} & \multirow{2}{*}{\bfseries Type} & \multicolumn{2}{c}{\bfseries Amount per} & {\bfseries Typical} & \multirow{2}{*}{\bfseries Tolerance} \\
& & Part & ROB & {\bfseries Temperatures} & \\ \midrule
Master Board & NTC (\qty{4.7}{\kilo\ohm}) & 8 & 16 & \qtyrange{20}{35}{\celsius} & \qty{\pm3}{\celsius} \\
$\rightarrow$ Master SCA & SCA internal & 1 & 2 & \qtyrange{20}{35}{\celsius} & \qty{\pm2}{\celsius} \\
Cluster Board & Pt1000 & 3 & 24 & \qtyrange{20}{35}{\celsius} & \qty{\pm2}{\celsius} \\
$\rightarrow$ Cluster SCA & SCA internal & 1 & 8 & \qtyrange{20}{35}{\celsius} & \qty{\pm2}{\celsius} \\
PACIFIC Board & Pt1000 & 2 & 16 & \qtyrange{20}{35}{\celsius} & \qty{\pm2}{\celsius} \\
SiPM array & Pt1000 & 1 & 16 & \qty{-40}{\celsius} (to \qty{30}{\celsius}) & \qty{\pm2}{\celsius} \\ \bottomrule
\end{tabular}
\end{table}
\subsubsection{Tolerances}
During the commissioning of the first \cframe{}, variations by more than \qty{20}{\celsius} between neighbouring sensors were observed, which considerably exceeded the expected tolerances.
It resulted from the fact that no variations in the \qty{100}{\micro\ampere} current, which is generated by the GBT-SCAs to operate the temperature sensors (see \cref{sec:cluster-sca}), were taken into account.
In order to increase the accuracy of the sensors, the current source of each SCA had to be measured accurately.
To achieve this, the connector saver boards of the FE-Tester were redesigned and now feature a high precision $\qty{4.7}{\kilo\ohm} \pm \qty{0.01}{\percent}$ resistor~\autocite{fe-tester}.
It connects to the same ADC input line that is normally used for the monitoring of the SiPM temperature via a Pt1000.
By knowing the resistance and measuring the voltage drop with the ADC, the generated current by the SCA can be calculated using Ohm's law as
\begin{equation}
I_\text{src} = \frac{V_\text{ADC}}{R} = \frac{V_\text{ADC}}{\qty{4.7}{\kilo\ohm} \pm \qty{0.01}{\percent}}\,.
\end{equation}
The uncertainty of the voltage measurement is given by the integral nonlinearity (INL) of the ADC that amounts to $\text{INL} < \qty{2}{LSB}$ in case of the GBT-SCA~\autocite{gbt-sca-manual}.
With the 12-bit resolution ADC covering the range between \qtyrange[range-phrase={ and }]{0}{1}{\volt} (see \cref{sec:cluster-sca}), this corresponds to an error $\Delta V_\text{ADC} = \qty{0.49}{\milli\volt}$.
Compared to the high precision resistor, it is the dominating uncertainty of the current source measurement resulting in an accuracy of $\Delta I_\text{src} = \qty{\pm 0.1}{\micro\ampere}$.
Using the described method, the current source of all Cluster SCAs installed on the first 9 \cframes{} were determined.
This was done as part of the test procedure of the ROBs on the two FE-Tester setups.
The distribution of the determined values is shown in \vref{fig:currsrc}.
By performing a Gaussian fit to the data, the mean value and standard deviation are estimated to be about \qty{98}{\micro\ampere} and \qty{4}{\micro\ampere}, respectively.
%Note that only the current source of the Cluster SCAs can be measured in that way, since an ADC line is easily accessible via the SiPM connectors.
\begin{figure}
\centering
\begin{minipage}[t]{0.49\textwidth}
\centering
\includegraphics[width=1.0\textwidth]{figures/commissioning/CurrSrc}
\caption{
Distribution of the currents as generated by the Cluster SCAs of the \num{196} ROBs installed on \cframes{} 1 to 9.
Measured as part of the FE-Tester test procedure.
A Gaussian fit is overlaid.
}
\label{fig:currsrc}
\end{minipage}
\hfill
\begin{minipage}[t]{0.49\textwidth}
\centering
\includegraphics[width=1.0\linewidth]{figures/commissioning/Temperatures}
\caption{
Readings of the temperature sensors during the commissioning of the first 9 \cframes{}.
}
\label{fig:temperatures}
\end{minipage}
\end{figure}
With the precise knowledge of the current source, the temperature accuracy of the Pt1000 sensors located on the Cluster and PACIFIC Boards, as well as the SiPM arrays, is given by the INL of the ADCs used to measure the voltage drop.
It accounts for an uncertainty of \qty{\pm 1.3}{\celsius}, while the contribution from the current source amounts to \qty{\pm 0.3}{\celsius}.
The tolerance of the Class B Pt1000 sensors in use add another \qty{\pm 0.3}{\celsius} resulting in the total tolerance of about \qty{\pm 2}{\celsius} for the corresponding temperature readings of the sensors.
Note that the uncertainties were simply added up linearly in that case since the underlying distributions are not known.
However, since the sources of errors are production tolerances and integral nonlinearities, the total uncertainty of \qty{\pm 2}{\celsius} can be understood as a tolerance as well, meaning that the true value is in any case within the given interval.
In case of the NTC thermistors located on the Master Boards, the current source is still the limiting factor of the precision.
Since the generated current by the Master SCAs cannot be determined in the same way, one has to assume a spread of about \qty{\pm 10}{\micro\ampere} that covers \qty{99}{\percent} of the distribution shown in \vref{fig:currsrc}.
However, due to a much steeper characteristic curve compared to the Pt1000 sensors, the resulting uncertainty on the temperature measurement only amounts to about \qty{2.4}{\celsius}.
The INL can be neglected in this case, while another \qty{0.7}{\celsius} has to be added that takes the \qty{3}{\percent} tolerance of the used NTC thermistors into account.
No further details about the internal SCA temperature sensors were available.
However, a similar spread as for the Pt1000 sensors was observed.
Therefore, the same tolerance of \qty{\pm 2}{\celsius} is assigned to both the Master and Cluster SCA's internal sensors.
\subsubsection{Temperature Readings}
As part of the front-end electronics commissioning, the values of all 82 temperature sensors located on each ROB are recorded.
Similar to the measurement of the optical power, an automated report is generated that includes summary plots and highlights malfunctioning sensors.
A sensor is considered faulty if it returns an unrealistic reading, meaning a value below the cooling water temperature of \qty{20}{\celsius} or more than \qty{15}{\celsius} above.
During the commissioning of the first 9 \cframes{}, only 1 of the total of \num{16072} temperature sensors was found to be defective.
The sensor concerned is a Pt1000 located on the backside of an SiPM array that reads temperatures beyond \qty{80}{\celsius}.
Using a multimeter, it was found that the resistance deviated by about \qty{20}{\percent} from the nominal value.
It was decided to keep the affected SiPM array on the \cframe{} and instead mask the temperature readings from it within the control software.
\Vref{fig:temperatures} shows the temperature readings of the remaining \num{16071} sensors, with the cooling water temperature set to \qty{20}{\degree}.
As can be seen, the sensors on the \PB{}s return the largest values.
This is because they receive a weaker cooling capacity since they are furthest away from the cooling block that is located just below the \MB{}s.
In addition, the PACIFICs draw a significant amount of power.
Note that the Novec cooling plant was not used in parallel to the electronics commissioning of the \cframes{}.
Therefore, the temperature of the SiPM arrays essentially follows the temperature in the assembly hall resulting in large variations throughout the year depending on the time of the commissioning.
Due to the water cooling, these seasonal effects are only faintly noticeable at the front-end electronics within the ROBs.
% 1°C per room temperature change of 5°C: Study by Jessy Daniel https://drive.google.com/file/d/1B6r5rLO81vgtlka82J5cNYD-bvWSPOTU/view
\subsubsection{Mounting Check}
In addition to the functional tests of the individual sensors, the monitoring and comparison of the temperature readings revealed several deficiencies in the mounting of the ROBs during the commissioning of the first 9 \cframes{}.
\Vref{fig:mounting-deficiency} shows an exemplary plot of the automated commissioning report that uncovered such an issue on \mbox{\cframe{} 1}.
It turned out that the significantly higher temperature readings of the ROB T3L2Q2M4 compared to the neighbours were caused by a pinched cable between the ROB and the cooling block, which prevented a good mechanical and thermal contact between the two.
Apart from this case, three further ROBs with insufficient thermal couplings to the cooling blocks due to loosely tightened screws were identified and corrected during the commissioning of subsequent \cframes{}.
\begin{figure}
\centering
\includegraphics[width=0.9\textwidth]{figures/commissioning/Boardtemps_T3L2Q2}
\caption{
Temperature readings of the front-end electronics located on T3L2Q2.
With the help of this plot as presented in the automated commissioning report, a mounting deficiency of the ROB T3L2Q2M4 was identified.
}
\label{fig:mounting-deficiency}
\end{figure}
Another type of problem that can be revealed in this way are faulty connections of the SiPM flex cables.
Plugging in these high density connectors is a delicate operation that occasionally does not succeed perfectly when connecting up to \num{384} SiPMs per \cframe{}.
Depending on the severity, this either leads to a poor conductivity or no electrical connection at all.
As a result, unrealistic temperatures are measured by the Pt1000 on the backside of the SiPM array.
This type of error occurred mainly with the first 4 \cframes{}.
During that time, 10 partially loose SiPM flex cables were detected and corrected in this way.
Thereby, valuable time could be saved in the later, more time consuming commissioning steps in which these issues would have appeared as dead channels.
%C-Frame 3: 8 SiPMs
%C-Frame 4: 1 SiPM
%C-Frame 2: 1 SiPM
\section{Voltage Sensors}
\label{sec:voltages}
Following the functional tests of the temperature sensors, the voltage monitoring circuits are examined in this commissioning step.
As introduced in \cref{ch:fee}, each \HalfROB{} is equipped with one circuit to monitor the low voltage (LV), and 16 circuits to monitor the high voltage (HV) for each SiPM die.
\subsection{Low Voltage}
\begin{figure}
\centering
\begin{minipage}{0.42\textwidth}
\centering
\includegraphics[width=1.0\textwidth]{figures/commissioning/LV_Monitoring}
\caption{
Low voltage monitoring circuit.
Resistance values taken from Ref.~\autocite{mb-edms}.
}
\label{fig:lv-monitoring}
\end{minipage}
\hfill
\begin{minipage}{0.56\textwidth}
\centering
\includegraphics[width=1.0\linewidth]{figures/commissioning/LV}
\caption{
Monitored input voltages of the \HalfROBs{} mounted on \cframes{} 1 to 9.
}
\label{fig:lv}
\end{minipage}
\end{figure}
The monitoring of the \HalfROB{} input voltage is based on the voltage divider as shown in \vref{fig:lv-monitoring}, which is located on the \MB{}.
Using this circuit, the input voltage is given by
\begin{equation}
V_\text{in} = \frac{R_1+R_2}{R_2} V_\text{mon} = 21 V_\text{mon} \quad \text{with} \quad R_2 = \left(\frac{1}{R_{21}}+\frac{1}{R_{22}}\right)^{-1} = \qty{500}{\ohm}\,,
\end{equation}
where the monitoring voltage $V_\text{mon} \in [0,1]\,\unit{\volt}$ is measured with the ADC of the Master SCA.
With the two (effective) resistors $R_1$ and $R_2$ in series, each with a tolerance of \qty{\pm 1}{\percent}, only a rough estimation with an accuracy of \qty{\pm 2}{\percent} is possible\footnote{Due to the unknown distribution of the production tolerances, they are again simply added up linearly to estimate the resulting tolerance.}, which corresponds to about \qty{140}{\milli\volt} for typical input voltages.
The value of the external supply voltage is critical for the correct functioning of the ROBs since the FEASTMP DC-DC converters require an input voltage $>\qty{6.5}{\volt}$ for a stable operation.
During the commissioning, it is verified that the value is above that limit.
\Vref{fig:lv} shows the monitored input voltages of the \HalfROBs{} mounted on \cframes{} 1 to 9 with a median value of \qty{7.1}{\volt}.
The difference to the set voltage of \qty{8}{\volt} at the MARATON power supply is due to voltage drops in the power cables.
Significantly lower values would be an indication for unknown resistances in the power line, i.e. due to improper connections of plugs and terminals.
As outlined in \cref{sec:commissioning-infrastructure}, one MARATON channel generally powers two neighbouring ROBs that draw a typical current of \qty{20}{\ampere}, i.e. \qty{5}{\ampere} per \HalfROB{}.
However, since there are only five ROBs per quadrant on the \cframes{} belonging to stations T1 and T2, the outer ROBs (M4) on these \cframes{} do not share a MARATON channel with a neighbour.
Consequently, the voltage drops in the cables are halved for these ROBs, resulting in larger values of the monitored input voltages around \qty{7.6}{\volt}.
During the commissioning, it was found that one MARATON channel was outputting a lower voltage than the nominal \qty{8}{\volt}.
As a result, the \HalfROBs{} that were powered by this channel were monitoring input voltages below \qty{6.8}{\volt}, as can be identified as a small peak on the left in \vref{fig:lv}.
However, they were still above the limit of \qty{6.5}{\volt} and operating stably throughout the commissioning, along with all the other ROBs mounted \cframes{} 1 to 9.
\subsection{High Voltage}
\begin{figure}
\centering
\begin{minipage}{0.35\textwidth}
\centering
\includegraphics[width=1.0\textwidth]{figures/commissioning/HV_Monitoring}
\caption{
High voltage monitoring circuit.
Resistance values taken from Ref.~\autocite{pb-edms}.
}
\label{fig:hv-monitoring}
\end{minipage}
\hfill
\begin{minipage}{0.63\textwidth}
\centering
\includegraphics[width=1.0\linewidth]{figures/commissioning/HV}
\caption{
Monitored bias voltages applied to the SiPM dies of \cframes{} 1 to 9.
}
\label{fig:hv}
\end{minipage}
\end{figure}
The HV monitoring circuits are located on the \PBs{}.
For each HV line powering one SiPM die consisting of \num{64} channels, a voltage divider as shown in \vref{fig:hv-monitoring} is implemented.
To reduce the current consumption due to the monitoring circuits, large resistors were selected.
The conversion between the applied bias voltage and the monitored voltage as measured with the ADC of the Cluster SCAs is given by
\begin{equation}
V_\text{bias} = \frac{R_{11}+R_{12}+R_2}{R_2} V_\text{mon}\,.
\label{eq:hv-monitoring}
\end{equation}
Despite the low tolerance resistors ($\pm \qty{0.1}{\percent}$) in use, it was decided to perform a calibration of the monitoring circuits when testing the ROBs on the FE-Tester setups.
As discussed in \cref{sec:sipm}, the SiPM gain depends critically on the applied voltage which is why a precise monitoring is required in this case.
To perform the calibration, bias voltages in the range from \qtyrange{5}{70}{\volt} are applied in steps of \qty{1}{\volt} using a high precision power supply.
By recording the monitored voltage for each bias voltage and performing a linear fit, the conversion factor is determined for each individual HV line.
In addition to the slope, which reflects the conversion factor in \cref{eq:hv-monitoring}, the intercept is also determined by the fit to account for small ($\sim$\unit{mV}) variations due to grounding fluctuations~\autocite{fe-tester}.
During the front-end electronics commissioning on the \cframes{}, an initial check of the HV monitoring is carried out by applying \qty{50}{\volt} to the SiPMs using the CAEN power supply modules.
\Vref{fig:hv} shows the monitored bias voltages as calculated using the conversion factors determined during the HV calibration on the FE-Testers.
As can be seen, the distribution is not centered around the nominal voltage and shows a relatively large spread.
Using a multimeter, sample checks were performed and confirmed that the applied voltage is indeed \qty{50}{\volt} and only deviates from it by up to \qty{50}{\milli\volt} in accordance with the specification of the CAEN modules~\autocite{caen-module}.
Therefore, it was concluded that the HV calibration parameters cannot be applied directly from the FE-Tester to the \cframe{} commissioning setup and a re-calibration had to be performed.
Although the exact reasons for the deviations could not be clarified, differences in the grounding are assumed to play an important role.
At this stage, the monitored bias voltages are considered okay if they differ by less than \qty{\pm 1}{\volt} from the applied voltage.
During the commissioning of the first 9 \cframes{}, it occurred only once that the measured value was outside of this range:
The monitoring circuit of one SiPM die on \cframe{} 7 reported a voltage of \qty{0}{\volt}.
It was found that the SiPM die was indeed not biased, which was due to a broken HV line within the associated ROB.
The issue was resolved by replacing the ROB.
\section{High Voltage Monitoring Calibration}
\label{sec:hv-calibration}
As mentioned in the previous section, it was found that the calibration of the HV monitoring circuits had to be repeated during the commissioning in order to achieve acceptable results.
For this purpose, using the CAEN power supply, bias voltages between \qtyrange[range-phrase={ and }]{20}{56}{\volt} are scanned in a non-equidistant manner with an emphasis on values around the expected operational voltage ($> \qty{50}{\volt}$).
Afterwards, a linear fit is performed in order to calibrate the conversion factor between the monitored and bias voltage as given in \cref{eq:hv-monitoring}.
An example is shown in \vref{fig:hv-fit}.
\begin{figure}
\centering
\begin{minipage}[t]{0.49\textwidth}
\centering
\includegraphics[width=1.0\textwidth]{figures/commissioning/HV_Fit}
\caption{
Linear fit to calibrate the bias voltage monitoring with the determined slope ($p_1$) and intercept ($p_0$) overlaid.
}
\label{fig:hv-fit}
\end{minipage}
\hfill
\begin{minipage}[t]{0.49\textwidth}
\centering
\includegraphics[width=1.0\linewidth]{figures/commissioning/HV_Calibrated}
\caption{
Monitored bias voltages applied to the SiPMs after the HV calibration of \cframes{} 1 to 9.
}
\label{fig:hv-calibrated}
\end{minipage}
\end{figure}
This procedure is applied to all monitoring circuits during the front-end electronics commissioning.
Subsequently, the check from the previous commissioning step is repeated:
A bias voltage of \qty{50}{\volt} is applied to all SiPMs and compared against the monitored voltages.
\Vref{fig:hv-calibrated} shows the monitored values after the HV calibration of the first 9 commissioned \cframes{}.
In contrast to \cref{fig:hv}, the obtained distribution is now centered around the applied voltage and varies with a standard deviation $\sigma = \qty{19}{\milli\volt}$ about 5 times less than before the calibration.
The maximum deviation of the total of \num{6272} HV monitoring circuits after the calibration of \cframes{} 1 to 9 was found to be \qty{\pm 100}{\milli\volt}.
\section{Bit Error Rate Tests}
\label{sec:ber}
Even though the Bit Error Rate (BER) tests of the optical data transmission are not scheduled as the subsequent commissioning step, they form the basis for the clock timing scans as presented in \cref{sec:timing_scans}.
Therefore, the BER tests are discussed here at first.
The BER is defined as the number of bit errors divided by the total number of transmitted bits for a data link.
Following the operating specifications of the Versatile Link, the \scifitracker{} aims at achieving error rates below $10^{-12}$~\autocite{versatile-link-application-note}.
This includes the electrical data transfer within the front-end electronics, as well as via the optical links.
The basis for the latter is a sufficient optical power as verified in a previous commissioning step (see \cref{sec:optical-power}).
\begin{figure}
\centering
\includegraphics[width=1.0\textwidth]{figures/commissioning/Clocking_Scheme}
\caption{
Clocking scheme of the SciFi front-end electronics.
The error-free data transmission relies on the clock domain crossings labelled as (A)--(E).
BER test patterns can be generated at three different points along the data path as indicated by the thick arrows.
}
\label{fig:clocking-scheme}
\end{figure}
An error-free transmission of the data within the ROBs requires the precise coordination of various functional blocks.
The careful tuning of the underlying clocks to allow for this is described in \cref{sec:timing_scans}.
\Vref{fig:clocking-scheme} illustrates the interplay between the different components:
The SiPM signals digitised by the PACIFIC ASICs are transfered at \qty{320}{\mega\bit/\second} to the Cluster FPGA.
In a first stage inside the FPGA, the serial data is deserialised in order to perform the clustering in the subsequent step.
Afterwards, the clustered data is again serialised in order to be transfered to the Data GBTX, which prepares the GBT frames to be sent via the optical link to the back-end.
In addition to the normal operating mode as just described, \vref{fig:clocking-scheme} shows two types of test patterns that can be generated at three different points along the path of the data.
These patterns are used to determine the BER of each data link and are discussed in detail in the following.
\subsubsection{GBTX PRBS}
The GBTX PRBS is one of two BER test patterns used in the SciFi front-end electronics.
As the name suggests, it consists of a pseudorandom binary sequence (PRBS) that is generated by the GBTX ASIC.
It is implemented by a 7-bit linear feedback shift register that produces a PRBS7 sequence~\autocite{gbtx-manual}.
The register value is copied 16 times to fill the 112 bits of payload of the GBT frame.
Due to the pseudorandom nature of the pattern, an appropriate decoder at the back-end can predict the sequence and compare it against the received data.
It counts the numbers of mismatches, i.e. the bit errors in the transmission, and stores them in a register to be read via the control software.
A separate 8-bit counter exists for each 7-bit pattern.
Like this, the BER can be determined.
In addition to being generated by the GBTX ASIC, the identical PRBS generator has been implemented in the Cluster FPGA firmware.
The advantage is that a larger part of the data chain can be tested, since the transmission from the Cluster FPGA to the Data GBTX is also included in this case.
This part of the data chain is particularly relevant as it involves a plug connection between the Cluster and Master Board, which may be susceptible to a bad electrical contact.
Since the same GBTX PRBS sequence is used, no additional resources are required at the back-end.
Starting from the commissioning of \cframe{} 4, a BER test utilising the GBTX PRBS generated by the Cluster FPGAs is an integral part of the commissioning procedure.
The importance of these checks were duly noted in the course of the operation and commissioning.
Two overnight runs with a duration of at least \qty{9}{\hour} per run are performed.
Due to DAQ firmware limitations (see \cref{sec:commissioning-daq}), two runs need to be performed to cover all data links of each \cframe{}.
Prior to \cframe{} 4, these measurements were only done occasionally and with shorter durations.
In total, \num{6.1e17} bits were probed during the tests up to the commissioning of \cframe{} 9.
In the process, \num{9} errors were found resulting in a BER of \num{1.5e-17}, which is well below the target rate of $10^{-12}$.
\subsubsection{PACIFIC SyncPattern}
\label{sec:syncpattern}
The PACIFIC SyncPattern is the second type of test patterns that are implemented in the SciFi front-end electronics.
As shown in \cref{fig:clocking-scheme}, it is generated by the PACIFIC ASICs and from there follows the complete data chain.
The exact pattern is determined by a group of four pads that are driven by the Cluster FPGA.
Thereby, it can be chosen from a total of 16 different patterns.
In order to produce the 8-bit word per PACIFIC output line and bunch crossing, the 4-bit input is concatenated with itself in reverse order as illustrated in \vref{fig:syncpattern}.
\begin{figure}
\centering
\includegraphics[width=1.0\textwidth]{figures/commissioning/PACIFIC_SyncPattern}
\caption{
PACIFIC SyncPattern generation.
The 4 input bits are concatenated with itself in an inverted manner to avoid that phase shifts by \qty{12.5}{\nano\second} remain undetected.
}
\label{fig:syncpattern}
\end{figure}
A custom TELL40 firmware block has been developed that analyses the received patterns at \qty{40}{\mega\hertz} and counts the number of detected bit errors.
For each PACIFIC channel, i.e. two output bits, there is a separate 8-bit counter that can be read out by the control system.
However, this process requires that the PACIFIC output is passed through by the Cluster FPGA in its raw format.
Due to bandwidth limitations, the raw data format used for this purpose has four modes, each containing the 2-bit outputs of \num{32} PACIFIC channels.
Therefore, to probe the data transmission of all \num{128} channels associated with one Cluster FPGA, four separate BER tests utilising the PACIFIC SyncPattern need to be performed.
Starting from the commissioning of \cframe{} 5, the procedure has been standardised to record data for \qty{3}{\hour} for each group of \num{32} channels.
In addition, the tests are performed with two complementary PACIFIC SyncPatterns
\[
\begin{aligned}
0\text{x}5 &\rightarrow 0\text{b}1010\,0101 \quad\text{and} \\
0\text{xA} &\rightarrow 0\text{b}0101\,1010.
\end{aligned}
\]
Thereby, a total of \qty{24}{\hour} are required to cover all four raw data modes and both patterns.
Due to the DAQ firmware limitations used for the commissioning in the assembly hall (see \cref{sec:commissioning-daq}), the process needs to be performed separately for both the upper and lower half of each \cframe{}.
Up to the commissioning of \cframe{} 9, \num{7.7e17} bits were probed in this way.
The number of observed errors is with about \num{2500} significantly larger than the BER tests utilising the GBTX PRBS generated by the Cluster FPGAs.
However, the nature of the errors, which mostly occurred in all \num{32} observed channels per link, suggests that they are related to the generation of the PACIFIC SyncPattern itself, rather than the actual data chain.
The exact cause, however, could not yet be clarified.
Despite this effect, the resulting BER of \num{3.2e-15} is still well below the target rate of $10^{-12}$.
%==== PACIFIC SyncPattern ====
%Bits total: 7.726749e+17
%Errors total: 2539
%=> Min BER: 3.285987232389575e-15
%=> Max BER: 6.57197446477915e-15
\subsubsection{Loss of Frame-Lock in Control Links}
During the BER tests, which typically run for several hours at once, the optical links are also monitored for losses of the acquired frame-locks.
As described in \cref{sec:master-gbtx-communication}, this happens when the GBT frame header is not recognised correctly for a few times.
Since all communication to the ROBs takes place via the optical control links, it is one of the most critical failures that can occur.
During the SciFi front-end electronics commissioning, temporary losses of frame-locks have been observed occasionally with some ROBs after a few hours of operation.
It later turned out that there is a general issue with the used VTRx modules that may occur with some units due to manufacturing problems related to insufficient epoxy curing~\autocite{vtrx-issue}.
Of the total of \num{392} commissioned \HalfROBs{} mounted on the \cframes{} 1 to 9, this type of failure was observed 6 times, corresponding to a rate of affected control links of \qty{1.5}{\percent}.
The affected ROBs were replaced with spares for further evaluation and repair.
\section{Clock Timing Scans}
\label{sec:timing_scans}
In order to achieve low transmission error rates as presented in \cref{sec:ber}, a careful tuning of the relative phases of various clocks is necessary.
Within each data link, a total of six clocks need to be adjusted that affect the internal data transmission within the front-end electronics.
The contributing clocks are shown in \cref{fig:clocking-scheme} and range from the \qty{320}{\mega\hertz} (\qty{3.125}{\ns} clock period) PACIFIC clocks to the clustering blocks of the FPGAs that are operating at \qty{40}{\mega\hertz} (\qty{25}{\ns}).
Note that the \qty{40}{\mega\hertz} clock that the Data GBTX receives from the Master GBTX does not have an impact on the internal data transmission, since it serves as the common reference for the generation of the remaining clocks.
However, it will become relevant during beam data taking as it allows to time align the data taking and processing to the LHC collisions.
\begin{figure}
\centering
\begin{minipage}[t]{0.49\textwidth}
\centering
\includegraphics[width=1.0\textwidth]{figures/commissioning/Eye_Diagram}
\caption{
Characteristics of an eye diagram.
Image adapted from Ref.~\autocite{eye-diagram}.
}
\label{fig:eye-diagram}
\end{minipage}
\hfill
\begin{minipage}[t]{0.49\textwidth}
\centering
\includegraphics[width=1.0\linewidth]{figures/commissioning/Serialiser_SingleCh}
\caption{
Determined bit errors of one PRBS7 word depending on the phase between the Cluster FPGA Serialiser and the Data GBTX.
}
\label{fig:serialiser-singlech}
\end{minipage}
\end{figure}
The aim of the clock phase tuning is to achieve the highest possible stability in the data transfer between the different processing blocks.
This is accomplished when the incoming bit stream from one block to the next is sampled in the centre of the bit period.
\Vref{fig:eye-diagram} illustrates this by means of an eye diagram, which is the superimposed representation of all possible digital waveforms.
As the sampling time approaches the edges of the eye diagram, bit errors start to occur due to jitter and incorrect sampling of the data.
In practice, the tuning of the involved clocks consists in varying their phases one by one in both directions until bit errors occur.
Like this, it is possible to determine the bit period and thus the optimal sampling time, which is located in the centre.
The detection of bit errors is enabled by using the BER test patterns introduced in \cref{sec:ber}.
One difficulty is that an isolated tuning of the individual phases is generally not possible.
When using the GBTX PRBS as generated by the Cluster FPGA, both the \qty{40}{\mega\hertz} clustering and \qty{80}{\mega\hertz} serialiser clock need to be correctly adjusted in order to allow for a stable data transmission.
On the other hand, when injecting the PACIFIC SyncPattern, all six clock phases must be set properly.
Since scanning all possible combinations in the six-dimensional space is not feasible, the practical approach is to start from a (semi-)stable configuration and vary the phases based on this.
Thereby, starting from the processing blocks most upstream of the data chain, i.e. going from right (A) to left (E) in \cref{fig:clocking-scheme}, the bit periods and thus the optimal settings can be determined one after another.
During the start of the commissioning, the procedure has not been fully implemented in the control software.
This is because it requires a complex interaction between the front- and back-end electronics in a way that was not foreseen by the centrally provided DAQ system.
Therefore, the results presented in the following do not include the first 2 commissioned \cframes{} 1 and 3.
However, the remaining 7 \cframes{} covered here contain \num{152} ROBs, corresponding to \qty{60}{\percent} of the complete detector.
\subsection{FPGA Serialiser $\rightarrow$ Data GBTX (A)}
The first clock domain crossing to be optimised is between the Cluster FPGA Serialiser and the Data GBTX - labelled as (A) in \cref{fig:clocking-scheme}.
This is achieved by shifting the phase of the Serialiser clock in steps of about \qty{100}{\pico\second}, while the (internal) clock of the GBTX remains unchanged.
As shown in \cref{fig:clocking-scheme}, the GBTX PRBS pattern generated by the Cluster FPGA can be used for this purpose.
Utilising this pattern, the number of bit errors can be determined in units of 7 bits, which are referred to as PRBS7 words in the following.
Thereby, with 112 bits of payload in the GBT wide frame mode, 16 separate counter values are available for every data link.
\begin{figure}
\centering
\includegraphics[width=0.85\textwidth]{figures/commissioning/Serialiser_ModuleIntervals}
\caption{
For an exemplary ROB, the error-free intervals in the phase between the Cluster FPGA Serialiser and the Data GBTX are shown.
One ROB comprises 16 individual data links (i.e. 16 different clock domain crossings) with 16 PRBS7 words each, resulting in the total of \num{256} words and corresponding error counters.
}
\label{fig:serialiser-module}
\end{figure}
An exemplary result of an FPGA Serialiser clock timing scan is given in \vref{fig:serialiser-singlech}.
It shows the number of detected bit errors depending on the clock phase for one PRBS7 word within one data link.
For every phase configuration, 500k GBT frames at \qty{40}{\mega\hertz} are recorded.
Within a central interval, no errors are found.
At the edges, the number of errors increases and quickly reaches the limit of \num{255}, which is due to the underlying 8-bit error counters.
An important thing to note is that during the scan of the phase between the Serialiser and Data GBTX, the phase between the Clustering and the Serialiser clocks is kept constant.
This is achieved by shifting the Clustering clock in parallel to the Serialiser and is necessary because the GBTX PRBS pattern is generated at the level of the Clustering block.
Therefore, varying the timing of the Serialiser alone does not provide useful information since it affects both the upstream and downstream clock domain crossings in an interfering manner.
The error-free intervals for an exemplary ROB including 16 Cluster FPGA-Data GBTX pairs and thus 16 corresponding clock domain crossings are shown in \vref{fig:serialiser-module}.
Local variations due to differences in the routing lengths and signal qualities can be identified.
%In addition, a macroscopic shift in units of \num{32} PRBS7 words between \num{128} and \num{160} can be observed that corresponds to a physical \CB{}.
Furthermore, the overlap of the intervals is indicated, which represents the common error-free interval of all \num{152} ROBs for which this measurement has been performed.
In practice, this means that all data links can be operated stably with the identical clock phase configuration between the Cluster FPGA Serialiser and the Data GBTX.
By setting the phase in the centre of the \qty{3.2}{\ns} wide common interval, the margin of error towards both edges amounts to at least \qty{1.6}{\ns} for all links.
\begin{figure}
\centering
\includegraphics[width=1.0\textwidth]{figures/commissioning/Serialiser_CentersWidths}
\caption{
Error-free intervals in the phase between the Cluster FPGA Serialiser and Data GBTX clock.
The distributions of the interval centres (left) and widths (right) include data from \num{152} ROBs as obtained during the front-end electronics commissioning.
The theoretical limit of \qty{6.25}{\ns} on the width results from the transmission speed of \qty{160}{\mega\bit/\second} per lane.
}
\label{fig:serialiser-histograms}
\end{figure}
\Vref{fig:serialiser-histograms} shows summary distributions of the interval centres and widths.
Note that the absolute values of the interval centres have no further significance, but only the relative differences are relevant.
As can be seen, the centres lie well within the interval overlap of all \num{152} included ROBs.
On the other hand, the interval widths are a measure of the signal quality.
The distribution shows a narrow peak around the mean value of \qty{5.24}{\ns}, which is about \qty{16}{\percent} smaller than the theoretical limit of \qty{6.25}{\ns}.
In addition, a shoulder-like bump towards lower values can be identified that indicates electrical lines with slightly lower signal quality.
The theoretical limit is calculated based on the transmission speed of \qty{160}{\mega\bit/\second} per lane between the Cluster FPGA and Data GBTX.
Thereby, under ideal conditions, the signal levels can be sampled accurately over the complete bit period of \qty{6.25}{\ns}.
However, as illustrated in \cref{fig:eye-diagram}, due to finite fall and rise times as well as jitter, the measured error-free interval widths will always be below this value.
\subsection{FPGA Clustering $\rightarrow$ Serialiser (B)}
After tuning the phase between the Cluster FPGA Serialiser and the Data GBTX in the previous step by setting it to be in the centre of the interval overlap, a scan of the Clustering clock is performed.
Thereby, the stable phase intervals between the FPGA Serialiser and Clustering blocks can be determined, corresponding to the clock domain crossing (B) in \cref{fig:clocking-scheme}.
Again, the GBTX PRBS test pattern as generated by the Clustering block is used.
The bit-error-free intervals for an exemplary ROB are shown in \vref{fig:clustering-module}.
A very regular pattern can be recognised that repeats itself every \num{16} PRBS7 words, i.e. in units of individual data links and Cluster FPGAs.
This is due to the fact that the clock domain crossing examined in this step happens entirely within the FPGA itself.
Therefore, the observable pattern completely depends on the firmware and the internal signal generations and routings in the FPGA.
The interval overlap of all \num{152} examined ROBs provides with a width of \qty{9.0}{\ns} plenty of margin of error when setting the common phase to lie in the centre of it.
\begin{figure}
\centering
\includegraphics[width=0.85\textwidth]{figures/commissioning/Clustering_ModuleIntervals}
\caption{
Error-free intervals in the phase between the Cluster FPGA Clustering and Serialiser blocks for an exemplary ROB.
The data within each individual Cluster FPGA and thus clock domain crossing is composed of 16 PRBS7 words.
}
\label{fig:clustering-module}
\end{figure}
Summary histograms of the interval centres and widths are shown in \vref{fig:clustering-histograms}.
Due to the regularity of the intervals, the distributions differ greatly for the different PRBS7 words, i.e. different parts in the resulting GBT data frame.
This applies in particular to the interval centres.
Nevertheless, they lie well within the interval overlap as indicated in \vref{fig:clustering-module}.
Based on the internal data transmission inside the FPGA, the theoretical limit on the interval width amounts to \qty{12.5}{\ns}.
\begin{figure}
\centering
\includegraphics[width=1.0\textwidth]{figures/commissioning/Clustering_CentersWidths}
\caption{
Error-free intervals in the phase between the Cluster FPGA Clustering and Serialiser blocks.
The distributions of the interval centres (left) and widths (right) are shown.
}
\label{fig:clustering-histograms}
\end{figure}
\subsection{FPGA Deserialiser $\rightarrow$ Clustering (C)}
A similar situation exists for the clock domain crossing (C) between the Deserialiser and Clustering blocks that happens entirely within the Cluster FPGA.
But unlike the previous step, the PACIFIC SyncPattern has to be used, since the GBTX PRBS pattern is generated downstream of this part of the data chain.
As discussed in \cref{sec:syncpattern}, the corresponding error detection logics in the TELL40s provide separate 8-bit error counters for the 2-bit output data of each PACIFIC channel.
Thereby, the total amount of error counters is a factor of 8 larger compared to the GBTX PRBS test pattern.
In order to record the data from all \num{128} channels per link, 4 separate runs need to be performed.
To scan the desired phase, the FPGA Deserialiser clock is varied, while the Clustering clock is kept at the common value as determined in the previous step.
The clocks of the SYNC Pulse and the PACIFICs need to be shifted in synchronous to avoid interferences with the phases further upstream in the data chain.
\Vref{fig:deser-module} shows the error-free phase intervals for two neighbouring exemplary \CBs{}.
Similar to the previous scan (B), a regular pattern can be recognised that repeats every \num{128} channels, i.e. in units of individual Cluster FPGAs.
Projecting the interval centres onto the y-axis for all \num{152} examined ROBs results in the distribution as shown in \vref{fig:deser-histograms}.
In addition, the interval widths including the theoretical limit are displayed.
Due to the significantly increased number of error counters (i.e. channels) per data link, a breakdown into different groups of channels as shown in \vref{fig:clustering-histograms} has been omitted in this case.
\begin{figure}
\captionsetup[subfigure]{aboveskip=-3pt, belowskip=5pt} %Tune space between captions and subfigures
\begin{subfigure}{1.0\linewidth}
\centering
\includegraphics[width=0.85\textwidth]{figures/commissioning/Deser_ModuleIntervals}
\caption{Cluster FPGA Deserialiser $\rightarrow$ Clustering}
\label{fig:deser-module}
\end{subfigure}\\[1ex]
\begin{subfigure}{1.0\linewidth}
\centering
\includegraphics[width=0.85\textwidth]{figures/commissioning/Pacific_ModuleIntervals}
\caption{PACIFIC $\rightarrow$ Cluster FPGA Deserialiser}
\label{fig:pacific-module}
\end{subfigure}\\[1ex]
\begin{subfigure}{1.0\linewidth}
\centering
\includegraphics[width=0.85\textwidth]{figures/commissioning/Sync_ModuleIntervals}
\caption{SYNC Pulse $\rightarrow$ PACIFIC}
\label{fig:sync-module}
\end{subfigure}
\caption{
Error-free intervals in the different phases along the data chain for two exemplary neighbouring Cluster-PACIFIC Board pairs utilising the PACIFIC SyncPattern.
The individual clock domain crossings are indicated by the vertical dashed lines.
Within each data link, one Cluster FPGA processes the data from two PACIFIC ASICs with \num{64} channels each.
}
\label{fig:syncpattern-intervals}
\end{figure}
\begin{figure}
\captionsetup[subfigure]{aboveskip=1pt, belowskip=12pt} %Tune space between captions and subfigures
\begin{subfigure}{1.0\linewidth}
\centering
\includegraphics[width=1.0\textwidth]{figures/commissioning/Deser_CentersWidths}
\caption{Cluster FPGA Deserialiser $\rightarrow$ Clustering}
\label{fig:deser-histograms}
\end{subfigure}\\
\begin{subfigure}{1.0\linewidth}
\centering
\includegraphics[width=1.0\textwidth]{figures/commissioning/Pacific_CentersWidths}
\caption{PACIFIC $\rightarrow$ Cluster FPGA Deserialiser}
\label{fig:pacific-histograms}
\end{subfigure}\\
\begin{subfigure}{1.0\linewidth}
\centering
\includegraphics[width=1.0\textwidth]{figures/commissioning/Sync_CentersWidths}
\caption{SYNC Pulse $\rightarrow$ PACIFIC}
\label{fig:sync-histograms}
\end{subfigure}
\caption{
Error-free intervals in different phases along the data chain utilising the PACIFIC SyncPattern.
The distributions of the interval centres (left) and widths (right) are shown.
}
\label{fig:syncpattern-histograms}
\end{figure}
\subsection{PACIFIC $\rightarrow$ FPGA Deserialiser (D)}
The last part of the data chain is between the PACIFIC ASICs and the Cluster FPGA Deserialiser.
For this, the data is serialised in the PACIFICs and transmitted to the FPGAs on the \CBs{} at rates of \qty{320}{\mega\bit/\second} per lane.
To scan the phase between the sender and receiver, the clocks of both PACIFICs within each data link are varied, while keeping the Deserialiser clock phase at the previously determined common setting.
The SYNC Pulse clock is shifted in synchronous with the PACIFIC clocks in order to maintain the same phase between them.
Because the signals need to be transfered across multiple PCBs with length variations in the signal lanes, the resulting error-free intervals in the phase between the sender and receiver show relatively large variations as displayed in \vref{fig:pacific-module}.
This is true on the microscopic level, as well as in blocks of \num{64} channels corresponding to different PACIFIC ASICs.
Due to the high transmission rates, the overlap of all error-free intervals of the examined \num{152} ROBs only amounts to \qty{1.2}{\nano\second}.
However, as presented in \cref{sec:ber}, sufficiently low bit error rates can still be achieved by setting the phase of all PACIFIC clocks to a common setting in the centre of the overlap.
A summary of the interval centres and widths is given in the form of histograms in \vref{fig:pacific-histograms}.
As can be seen, the measured widths are only slightly below the theoretical limit of \qty{3.125}{\ns} indicating a good signal quality.
\subsection{SYNC Pulse $\rightarrow$ PACIFIC (E)}
Even though the SYNC Pulse is not part of the data chain itself, it is still crucial for the transmission of the data since it is responsible for the synchronisation of the PACIFIC ASICs.
As described in \cref{sec:pacific_ser}, the signal is evaluated at the rising edge of the \qty{320}{\mega\hertz} PACIFIC clock.
Therefore, it is expected that the SYNC Pulse clock can be varied by about \qty{3.125}{\ns} before the signal is evaluated at the previous or next rising edge causing the output data to be shifted by one PACIFIC clock cycle.
This can be confirmed by looking at the distribution of the error-free interval widths as shown in \vref{fig:syncpattern-histograms}.
Since the SYNC Pulse is generated by the Cluster FPGA, the interval centres show a similar spread as the timing scan between the PACIFIC and the FPGA Deserialiser.
However, as shown in the exemplary view of the intervals for two neighbouring Cluster-PACIFIC Board pairs in \vref{fig:sync-module}, the variations occur predominantly in blocks of \num{64} channels, i.e. in units of different PACIFIC ASICs.
This is due to the fact that each PACIFIC receives and processes a single SYNC Pulse.
The interval overlap of all \num{152} examined ROBs is with \qty{0.9}{\ns} in the same range as for the phase between the PACIFIC and FPGA Deserialiser clocks.
\subsection{Conclusion}
A summary of the obtained parameters during the clock timing scans as previously presented is given in \cref{tab:timing-scans}.
As given by the centre of the overlap region of the error-free intervals, the ROBs can all be operated at the identical clock phase configuration while still achieving low bit error rates (see \cref{sec:ber}).
The smallest margin of error is present in the phase between the SYNC Pulse and PACIFIC clocks and amounts to \qty{0.44}{\ns} in both direction (\qty{0.88}{\ns} in total).
\begin{table}
\centering
\begin{threeparttable}
\caption{
Parameters of the error-free phase intervals as determined by the clock timing scans.
At the centre of the interval overlap (second column from the right), the SciFi front-end electronics can be operated stably with the same clock phase configuration for every ROB.
The numbers are given in \unit{\ns}.
}
\label{tab:timing-scans}
\begin{tabular}{@{}lrrrrrr@{}}
\toprule
\multirow{2}{*}{\bfseries Clock Phase} & \multicolumn{2}{c}{\bfseries Int. Centres} & \multicolumn{2}{c}{\bfseries Int. Widths} & \multicolumn{2}{c}{\bfseries Int. Overlap} \\
& Mean $\mu$ & Std. $\sigma$ & Mean $\mu$ & Std. $\sigma$ & Centre & Width \\ \midrule
Serialiser $\rightarrow$ Data GBTX & 25.32 & 0.26 & 5.24 & 0.25 & 25.44 & 3.22 \\
Clustering $\rightarrow$ Serialiser & 30.17 & 0.51 & 10.71 & 0.81 & 29.61 & 9.03 \\
Deserialiser $\rightarrow$ Clustering & 2.13 & 0.30 & 5.86 & 0.33 & 1.68 & 4.35 \\
PACIFIC $\rightarrow$ Deserialiser & 13.29 & 0.25 & 2.96 & 0.08 & 13.13 & 1.17 \\
SYNC Pulse $\rightarrow$ PACIFC & 0.77 & 0.31 & 3.00 & 0.05 & 0.73 & 0.88 \\ \bottomrule
\end{tabular}
\begin{tablenotes}[flushleft]
\small
\item Std. $\mathrel{\hat{=}}$ Standard deviation \qquad Int. $\mathrel{\hat{=}}$ Interval
\end{tablenotes}
\end{threeparttable}
\end{table}
\section{Light Injection System Tests}
\label{sec:lis-tests}
In the last step of the SciFi front-end electronics commissioning procedure, the complete data chain is examined.
While the digital data transfer has already been tested and tuned during the BER tests (see \cref{sec:ber}) and clock timing scans (see \cref{sec:timing_scans}), this part also includes the analogue signal generation and processing in the SiPM and PACIFIC channels.
In the absence of ionising radiation in the assembly hall, these tests are performed with the help of the light injection system (LIS).
As described in detail in \cref{sec:lis}, the LIS allows to inject light pulses into the scintillating fibres towards the end of the fibre mats near the SiPM arrays.
Thereby, it is possible to test the full data chain and identify channels that do not register any signals from the incident photons.
This can either be caused by broken SiPM or PACIFIC channels, or due to an insufficient electrical connection between the two.
The tests involving the LIS are subdivided into three steps that are discussed in the following.
\subsection{Functional Test and Mapping}
As a first step, the functioning of the LIS itself is verified.
This is done only with the help of the SiPM arrays that are biased with the nominal overvoltage $\OV = \qty{3.5}{\volt}$.
By monitoring the current drawn from the CAEN modules with and without injected light pulses, the correct functioning of the LIS can be confirmed.
Typical dark currents are in the order of \qty{100}{\micro\ampere} per HV channel, corresponding to four adjacent SiPM arrays.
It increases to about \qty{200}{\micro\ampere} when illuminating the SiPMs with \qty{15}{\ns} wide LIS pulses at a rate of \qty{20}{\kilo\hertz}.
However, large variations in the light intensity can be observed that depend on the determined operating settings of the GigaBit Laser Drivers (GBLDs) during the QA of the fully assembled fibre modules.
\begin{figure}
\centering
\includegraphics[width=1.0\textwidth]{figures/commissioning/LIS_Routing}
\caption{
SciFi ROB cover with the routing of the LIS GBLDs to the fibre mats indicated.
Image modified from Ref.~\autocite{fe-box-mechanics}.
}
\label{fig:lis-routing}
\end{figure}
For further tuning of the light intensity, the mapping between the GBLDs and light bars must be known for each module.
As illustrated in \cref{fig:lis-routing}, the light bars mounted on fibre mats 0 and 2 are controlled by \HalfROB{} H0, while mats 1 and 3 are illuminated by the GBLDs connected to H1.
However, due to the routing of the LIS fibres within the end plugs of the modules, it is not clear at first which GBLD of each half connects to which of the two possible mats\footnote{Accidentally, this was not specified during the production of the modules.}.
In order to create the corresponding mapping, the current draw is monitored as described previously while enabling the individual GBLDs one by one.
\subsection{Delay Scan (Lite)}
After verifying the functioning of the LIS itself and determining the mapping, the actual test consists of checking that every channel correctly detects the incident photons.
For this, it is necessary that the light pulses are triggered at the right time with respect to the PACIFIC clock such that the resulting SiPM signals are fully integrated by the PACIFIC integrators.
For an exemplary channel, \vref{fig:full-delayscan-example} shows the S-curves for different LIS pulse delays as obtained by scanning the PACIFIC comparator thresholds.
As can be seen, the characteristic step structure as described in \cref{sec:th_scan} is only visible within a few nanoseconds around the optimal delay, which amounts to \qty{29}{\ns} in this case.
On the other hand, when moving away from the optimum, the S-curves degrade until eventually only a single transition remains corresponding to the pedestal of the channel.
\begin{figure}
\captionsetup[subfigure]{aboveskip=1pt, belowskip=12pt} %Tune space between captions and subfigures
\begin{subfigure}{1.0\linewidth}
\centering
\includegraphics[width=0.8\textwidth,trim={0 0 0 25pt},clip]{figures/commissioning/Delayscan_example}
\caption{Full delay scan}
\label{fig:full-delayscan-example}
\end{subfigure}\\
\begin{subfigure}{1.0\linewidth}
\centering
\includegraphics[width=0.8\textwidth,trim={0 0 0 25pt},clip]{figures/commissioning/Delayscan_Lite_example}
\caption{Delay scan lite}
\label{fig:delayscan-lite-example}
\end{subfigure}
\caption{
Threshold scans of one channel for different delays of the light injection pulse (a).
The vertical dashed lines indicate the thresholds of the three PACIFIC comparators that are used for the delay scan lite (b).
}
\label{fig:delayscan-examples}
\end{figure}
In principle, recording the \Scurves{} by scanning the PACIFIC comparator thresholds for different delays in order to find the best delay is only needed once.
However, the limitations of the available DAQ firmwares in combination with the design of the LIS requires for a delay scan for essentially every \cframe{} commissioning.
On the one hand, the SOL40 control links have different latencies in the order of \qty{\pm 25}{\ns} relative to each other.
In addition, these are not fixed but may vary after every reset.
By itself, this is unproblematic as long as each control link corresponding to a \HalfROB{} remains an isolated system, which is the case for many applications in the context of the front-end electronics commissioning.
However, due to the design of the LIS, the two halves of one ROB are closely entangled.
This is because the two inner fibre mats of one module are illuminated by light bars that are controlled by the neighbouring \HalfROB{} as illustrated in \cref{fig:lis-routing}.
Therefore, while the outer mats are read out by the same \HalfROB{} that controls the light bars, the optimal LIS delays of the inner mats are shifted in opposite directions depending on the relative latency between the two halves.
Doing a full delay scan as shown in \cref{fig:full-delayscan-example} is a time-consuming procedure that is not feasible to do for every \cframe{}.
With the limited firmwares and the associated software, scanning the 256 threshold DACs for a fixed delay already takes about \qty{20}{\minute}.
For this reason, a simplified version is performed that is referred to as delay scan lite.
It is based on varying the LIS delay while keeping the three comparator thresholds fixed.
Although in principle, any threshold values above the pedestal and within a few photoelectron (pe) amplitudes are suitable, they are set to the default settings of $[1.5, 2.5, 3.5]\,\text{pe}$.
However, since the precise threshold calibration as described in \cref{sec:th_scan} is not yet possible without knowing the optimal delay, the values are estimated assuming a typical distance of \qty{15}{DACs} between photoelectron peaks.
The required position of the pedestal is determined by doing a single threshold scan without injecting light pulses.
A zoomed in view of a pedestal transition is shown in \vref{fig:pedestals} (left).
As illustrated, the position of the pedestal is calculated by taking the point on the linear interpolated curve that matches the ratio of events above threshold of 0.5.
\Vref{fig:pedestals} (right) shows the pedestal positions of the 400k channels that are obtained in that way during the commissioning of \cframes{}~1 to 9.
\label{sec:delayscan-lite}
\begin{figure}
\captionsetup[subfigure]{aboveskip=1pt, belowskip=12pt} %Tune space between captions and subfigures
\begin{subfigure}{0.48\linewidth}
\centering
\includegraphics[width=1.0\textwidth]{figures/commissioning/Pedestal_example}
\end{subfigure}
\hfill
\begin{subfigure}{0.51\linewidth}
\centering
\includegraphics[width=1.0\textwidth]{figures/commissioning/Pedestal_distribution}
\end{subfigure}
\caption{
Exemplary \Scurve{} of one channel without the injection of any signal showing the pedestal transition (left).
The position of the pedestal is given by the point on the linear interpolated curve that matches the ratio of 0.5.
The distribution on the right shows the determined pedestal positions of the 400k channels within \cframes{} 1 to 9.
}
\label{fig:pedestals}
\end{figure}
The threshold values calculated based on an assumed photoelectron peak separation of \qty{15}{DACs} and the determined pedestal position as just described are indicated by the vertical dashed lines in the example in \cref{fig:full-delayscan-example}.
As can be seen, they are slightly off compared to the intended positions in the middle of the S-curve plateaus.
However, as noted earlier, the exact values are not relevant for this particular case.
\Cref{fig:delayscan-lite-example} shows the result of the subsequent delay scan lite for the exemplary channel.
The optimal delay is given by the maxima of the curves.
\begin{figure}
\centering
\includegraphics[width=0.8\textwidth,trim={0 0 0 25pt},clip]{figures/commissioning/Delayscan_Lite_T3L2Q3M5}
\caption{
Distribution of the optimal LIS delay for an exemplary ROB based on the ratio of events above the lowest threshold corresponding to about \qty{1.5}{pe}.
}
\label{fig:delayscan-lite-module}
\end{figure}
In the same way, the best delays are determined for all channels of each \cframe{}.
\Vref{fig:delayscan-lite-module} shows an exemplary result for the \num{2048} channels of one ROB, split by the four read out fibre mats.
As expected, the two outer mats 0 and 3 show very similar optimal delays, while the values of the inner mats 1 and 2 are shifted in opposite directions.
\subsection{Threshold Scans}
In principle, achieving ratios of events above threshold that are significantly larger than 0 for all channels at any delay already indicates that the channels are correctly detecting the incident photons.
However, as an additional test, full threshold scans are performed at selected LIS delays based on the results from the delay scan lite.
For the exemplary ROB shown in \cref{fig:delayscan-lite-module}, threshold scans at three different delays are required.
Typically, a handful of different settings need to be chosen to cover all channels during the commissioning of one \cframe{}.
For each channel, the \Scurve{} at the best timing is evaluated in order to identify dead channels.
The property that is used for this purpose is the integral under the \Scurve{}, starting from the position of the pedestal.
This is illustrated in \vref{fig:integral-illustration}.
As shown in \cref{fig:full-delayscan-example}, the area under the \Scurve{} is rising with LIS delays closer to the optimum.
This is because the charge accumulated by the PACIFIC integrator increases while the light intensity, as determined by the settings of the GBLD, is constant.
\begin{figure}
\centering
\begin{minipage}[t]{0.49\textwidth}
\centering
\includegraphics[width=1.0\textwidth]{figures/commissioning/Scurve_integral_illustration}
\caption{
Illustration of the \Scurve{} integral for one channel.
The \Scurves{} with illumination of the SiPM (LIS), as well as without any signal (Pedestal) are shown.
}
\label{fig:integral-illustration}
\end{minipage}
\hfill
\begin{minipage}[t]{0.49\textwidth}
\centering
\includegraphics[width=1.0\linewidth]{figures/commissioning/Scurve_jumps}
\caption{
Example of a broken PACIFIC channel showing a non-monotonous S-curve with several jumps.
}
\label{fig:scurve-jumps}
\end{minipage}
\end{figure}
An example of a type of failure that can be identified in this commissioning step is shown in \vref{fig:scurve-jumps}.
As can be seen, the depicted \Scurves{} do not follow a monotonous course that is to be expected due to the nature of the \Scurves{} as introduced in \cref{sec:th_scan}.
Instead, they reveal several jumps along the threshold scan.
Since these occur in a similar way for both PACIFIC integrators, the error likely lies in the relation between the threshold DACs and the resulting comparator voltage level.
Due to the generation of the thresholds by the three 8-bit current DACs per channel (see \cref{sec:pacific_digitisation}), some jumps in the \Scurves{} are to be expected.
However, these mostly occur at multiples of \qty{64}{DACs} and predominantly at \qty{128}{DACs}, but not to the extend as shown in \cref{fig:scurve-jumps}.
Therefore, these types of channels are classified as broken during the commissioning.
%Channels with jumps:
%C-Frame 4: 3 channels
%C-Frame 2: 2 channels
%C-Frame 6: 1 channel
%C-Frame 8: 1 channel
In terms of dead channels, i.e. channels that are unable to detect the incident photons, the area under the \Scurve{} as introduced previously is considered.
\Vref{fig:integral-module} shows the \Scurve{} integral over the course of the \num{2048} channels of one fibre module.
A clear separation between the threshold scans under injected light pulses, and pedestal-only runs can be identified, indicating that the channels are working properly.
The course of the LIS curve is mainly determined by the luminosity along the \qty{13}{\cm} light bars.
Since the fibres inside the light bars are scratched manually, each one has a unique light profile.
\begin{figure}
\centering
\includegraphics[width=1.0\textwidth,trim={0 0 0 25pt},clip]{figures/commissioning/Scurve_integral_module}
\caption{
\Scurve{} integral along the \num{2048} channels of one exemplary fibre module.
The profile is shown without injecting any signal (pedestal-only), as well as under illumination with pulsed light (LIS).
}
\label{fig:integral-module}
\end{figure}
In summary, the distributions as shown in \vref{fig:integral-distribution} are obtained.
They show the \Scurve{} integrals for all \num{401408} channels that were examined during the commissioning of \cframes{} 1 to 9.
A total of \num{7} channels exhibit a LIS integral below 5, which has been defined as the lower limit in the course of the commissioning.
However, after closer examination, only \num{3} of these turned out to be dead channels, while the remaining 4 suffered from a low light intensity.
The cause of the malfunctions was found in the associated \num{3} SiPM channels, which were already marked as dead during the QA of the assembled SiPM arrays inside the cold boxes.
\begin{figure}
\centering
\includegraphics[width=0.9\textwidth]{figures/commissioning/Scurve_integral}
\caption{
Distributions of the \Scurve{} integrals for the 400k channels as examined during the commissioning of \cframes{} 1 to 9.
}
\label{fig:integral-distribution}
\end{figure}
In addition to the \num{3} dead channels, \num{7} channels were identified that show similar jumps in the \Scurves{} as depicted in \cref{fig:scurve-jumps}.
In relation to the total of \num{401408} examined channels, this corresponds to a quota of \qty{2.5e-5} malfunctioning channels.
\subsubsection{Outlook}
Regarding the threshold scans that will need to be performed for the precise calibration of the threshold DACs during normal operation, it is expected that no preceding delay scans (lite) will be required every time.
However, this requires that the control links are provided with fixed latencies by the SOL40s.
Nevertheless, it will be indispensable to once determine the optimal LIS pulse delay by means of the presented methods.
\section{Commissioning Summary}
In this chapter, the commissioning of the front-end electronics of a large fraction of the \scifitracker{} has been presented.
Due to the complexity of the system (see \cref{sec:complexity}), it has been a challenging task.
This concerns in particular the commissioning of the early \cframes{} that took significantly longer because many tools had to be developed alongside the commissioning procedure itself.
It was the first time that the electronics were operated on such a large scale and at the system level along with the surrounding infrastructure.
After overcoming the initial challenges, including the establishment of the \qty{40}{\mega\hertz} readout in conjunction with a near-final version of the upgraded LHCb DAQ system (see \cref{sec:commissioning-daq}), the commissioning procedure steadily became a smooth and routine operation.
Nevertheless, there were also a few critical failures that could only be resolved by replacing the affected ROBs.
However, due to the fact that the commissioning takes place in a dedicated hall on the surface of the LHCb site, the replacement operations were well possible.
Overall, it could be verified that the SciFi front-end electronics generally works flawlessly and meets very high quality requirements.
This was made possible by means of a strict quality assurance (QA) before the installation of the various electronics components (see \cref{sec:qa}), as well as before mounting the individual ROBs on the \cframes{} (\cref{sec:fe-tester}).
Among other things, it could be demonstrated that, after careful tuning of the underlying clock phases, a stable data transmission way below the required bit error rate of $10^{-12}$ is achievable on a large scale consisting of a few thousand optical data links.
In addition, only a handful of channels were found to be malfunctioning out of a total of 400k detector channels tested.
%\section{Internal Charge Injection}
%\subsection{Uniformity across PACIFIC channels}
%\subsubsection{Maximum charge Qmax}
%\subsubsection{Time at maximum charge Tmax}
%\subsection{Stability of baseline, Qmax and Tmax}
%And reproducibility - across repeated measurements, power cycles, board temperatures etc.
%\subsection{Differences of integrators}
%Fundamental! i.e. trimming
%\subsection{Differences of comparators}
%Fundamental!
%\subsection{Evaluation of fast threshold scan}
%\section{Threshold Scans}
%\label{sec:th_scans}
%\subsection{Uniformity across channels}
%\subsubsection{Time at maximum integration Tmax}
%\subsection{Stability of baseline and Tmax}
%And reproducibility - across repeated measurements, power cycles, board temperatures etc.
%\subsection{Feasibility of SiPM breakdown voltage (Vbd) determination}
%Over-voltage scan
%\section{Sensor Readings}
%\subsection{High-Voltage}
%\subsubsection{Achievable accuracy after calibration}
%\subsubsection{Sensitivity on board temperature}
%\subsection{Temperature spread within different board types}
%\section{Additional notes}
%FE-Tester delay scan is stored in $/calib/sf/FETester/4TSLPCEFB00055/DelayScan/20200811_19h00m22s$