66 Selection code
Renata Kopecná edited this page 2022-03-30 10:05:45 +02:00

The selection code is a set of C++ scripts. First, the running of the code is introduced. Then, each part of the code is explained. The main file of the selection code is B2Kstmumu.cpp. However, as the code required a lot of reruns and checks at different places, the main code execution became obsolete and the scripts are compiled and run in ROOT step by step. However, the file B2Kstmumu.pro can be used by the QtCreator IDE.

The selection takes about two days to fully run. I recommend to run the preselection step over night, as it loops over about 800GB of data and selects only the interesting events. The second bottleneck is the multiple candidates removal, as it checks all subarrays of an array. Lastly, the adding of variables is not very optimized and takes about an hour.

Table of Contents

Setting up ROOT

There are several ways how to setup ROOT. What worked fro me is running the following commands:

delta and lhcba1:

source /cvmfs/sft.cern.ch/lcg/views/setupViews.sh LCG_97python3 x86_64-centos7-gcc9-opt

d0new and sigma0

LD_LIBRARY_PATH=/cvmfs/lhcb.cern.ch/lib/lcg/releases/gcc/4.9.3/x86_64-slc6/lib64:/cvmfs/lhcb.cern.ch/lib/lcg/releases/LCG_84/ROOT/6.06.02/x86_64-slc6-gcc49-opt/lib:/cvmfs/lhcb.cern.ch/lib/lcg/releases/LCG_84/GSL/1.10/x86_64-slc6-gcc49-opt/lib:/home/lhcb/kopecna/B2KstarMuMu/


PATH=/cvmfs/lhcb.cern.ch/lib/lcg/releases/LCG_84/gcc/4.9.3/x86_64-slc6/bin:/cvmfs/lhcb.cern.ch/lib/lcg/releases/LCG_84/ROOT/6.06.02/x86_64-slc6-gcc49-opt/bin:/cvmfs/lhcb.cern.ch/lib/lcg/releases/gcc/4.9.3/x86_64-slc6/bin:/usr/sue/bin:/usr/lib64/qt-3.3/bin:/cvmfs/lhcb.cern.ch/lib/bin/x86_64-slc6:/cvmfs/lhcb.cern.ch/lib/bin/Linux-x86_64:/cvmfs/lhcb.cern.ch/lib/bin:/cvmfs/lhcb.cern.ch/lib/var/lib/LbEnv/1020/stable/linux-64/bin:/bin:/usr/bin:/usr/sbin:/sbin::/home/lhcb/kopecna/B2KstarMuMu/

source /cvmfs/lhcb.cern.ch/lib/lcg/releases/LCG_84/ROOT/6.06.02/x86_64-slc6-gcc49-opt/bin/thisroot.sh 

In order to make pretty plots, dedicated code in Design.cpp is always checked for compilation when starting ROOT. This is done by creating a .rootalias.C file in your home directory and telling ROOT to always compile the Design.cpp:

{
gROOT->ProcessLine(".L path/to/your/folder/B2KstarMuMu/Code/Selection/Design.cpp+");
}

And we need to tell ROOT where to look: create a '.rootrc' file in your home directory with:

Rint.Load:  $(HOME)/.rootalias.C

Anytime you start ROOT by using root, the intro printout shows up. I find it useful to set an alias to automatically include the -lflag that supresses this output. Open your ~.bashrc file and add

alias root="root -l"

If you need to verify the ROOT version at use, open ROOT and run

Setting up python3 and pyROOT

The simples way to setup python3 and pyROOT on delta and lhcba1 is to run the following command:

source  /cvmfs/sft.cern.ch/lcg/views/setupViews.sh LCG_97python3 x86_64-centos7-gcc9-opt

Please keep in mind that in the time of writing quite a lot of code, the standard on the servers used to be still python2 (I know, don't). In case there is such an error, you either want to somehow setup python2 or better, fix the script to be python3 friendly.

General remarks

The code is a long series of tweaks, be patient when something is not ideal.

Input parameters

Generally, the input parameters are

  • string magnet
    Refers to the polarity of the sample, it is either "down" or "up"
  • string year
    Year of the sample, 2011, 2012, 2015, 2016, 2017 or 2018
  • int Run\
    • 0: run the function per given year and not run
    • 1: 2011+2012
    • 2: 2015+2016+2017+2018
    • 12: Both runs
  • bool MC
    false = use data
    true = use MC
  • bool ReferenceChannel
    false = use the signal Q2 region
    true = use the Jpsi Q2 region
  • bool PHSP
    false = do not use PHSP MC
    true = use PHSP MC\

Verbosity

There are 5 levels of verbosity: The level of verbosity is defined in GlobalFunctions.hh. The default level is [INFO], corresponding to level 2.

Running the code

The code consists of several C++ scripts that are compiled and executed in ROOT. Before every step, it is worth closing and opening ROOT, as it is possible ROOT will complain about redefinition of functions. If it complaints about redefinitions, just go ahead and remove the corresponding .d, .so and .pcm files.

I also highly recommend running on sigma0, d0new or delta and to source ROOT with this manual. The newer ROOT versions changed some RooFit functions and it results in a segfault. So yeah. Go ROOT.

We used ROOT 6.06.02. In order to run the sselection code, navigate to your folder with the code. Then, run a bashcript that will automatically create the necesarry folder sturcture.

cd /path/to/your/folder/B2KstarMuMu/Code/Selection/
bash createFolders.sh

Before running, please change the corresponding output paths in [GlobalFunctions.hh.

First, compile and run the preselection. It is defined in BDTSelection.cpp. This reads the files with stripped data and creates new tuples with preselected data.

root
.L BDTSelection.cpp+
runAllSignalData(1); runAllSignalData(2);
runAllSignalMC(1); runAllSignalMC(2);
runAllRefMC(1);  runAllRefMC(2);
runAllPHSPMC(1); runAllPHSPMC(2);
.q

Then, run a python script (no need to source python as described above! lb-conda takes care of loading the pandas and everything) Rescale_pi0momentum.py performing the Kstar MacGyver DTF

lb-conda default python Scripts/Rescale_pi0momentum.py

Next step is to compile and perform the MC Truth-Matching, saved in MCtruthmatching.cpp. The Truth-matching procedure is in detail described in my thesis.

root
.L MCtruthmatching.cpp+
TruthMatchAllAll(1); TruthMatchAllAll(2);
.q

Then, we need to add the XMuMu mass variable and apply the KplusMuMu veto (AddVariable.cpp)

root
.L Scripts/AddVariable.cpp+
addAllXMuMuMass(true,true,1); addAllXMuMuMass(false,true,1); applyAllVetoKplusMuMuMass(1);
addAllXMuMuMass(true,true,2); addAllXMuMuMass(false,true,2); applyAllVetoKplusMuMuMass(2);
.q

We have all the preselection finished. Now we will need to fit the reconstructed B mass peak. If running this for the first time, the instructions how to compile the code and make RooFit use double-sided Crystal Ball or ExpGauss, see B mass model section.

Now the peaking background is removed, we can proceed to reweighting via nTrackWeights.cpp. It takes the preselected tuples and create new weighted ones, with the tag BDT input.

root
.L nTrackWeights.cpp+
WeightAll(true,1,true); ReweightReferenceMC(true,1,true); ReweightPHSPMC(true,1,true);
WeightAll(true,2,true); ReweightReferenceMC(true,2,true); ReweightPHSPMC(true,2,true);
.q

Check the MVA variables are agreeing after weighting them with sWeights CompareVariables.cpp. (Yes, there is a dedicated tool, but this was intially working, and working well, so it was kept for checking the variables used in the MVA training.)

root
.L Scripts/compareVariables.cc+
compareAll(1); compareAll(2);
.q

Reweighted Data and Monte Carlo can be used for the MVA.cpp

root
.L MVA.cpp+
RunMVA(1); RunMVA(2);
.q

Apply the MVA to all the MC and Data using TMVAClassApp.cpp. This also creates new tuples with the tag BDT output.

root
.L TMVAClassApp.cpp+
TMVAClassAppAll(1); TMVAClassAppAll(2);
.q

Remove all multiple candidates, defined in RemoveMultipleCandidates.py. Before running this for the first time, you have to change the path in getTreePath function and similarly in the getTreeList function. Don't forget to compile getPathForPython.cc before the first usage of the code!

python RemoveMultipleCandidates.py -all

You can plot the functions by running PlotMultiple.py.

We have to rerun the weights and therefore also the MVA: the shape of the B mass peak is fixed to the one after MVA.

root
.L nTrackWeights.cpp+
WeightAll(true,1,true); ReweightReferenceMC(true,1,true); ReweightPHSPMC(true,1,true);
WeightAll(true,2,true); ReweightReferenceMC(true,2,true); ReweightPHSPMC(true,2,true);
.q

Check the variables again

root
.L Scripts/compareVariables.cc+
compareAll(1); compareAll(2);
.q

Run the MVA training, make nice plots, apply the MVA and remove multiple candidates

root
.L MVA.cpp+
RunMVA(1); RunMVA(2);
.q

root
.L PlotTMVA.cpp+
SaveAllFromOneFile(2011,1,false,false,0,false,"",false);
SaveAllFromOneFile(2016,2,false,false,0,false,"",false);
.q

root
.L TMVAClassApp.cpp+
TMVAClassAppAll(1); TMVAClassAppAll(2);
.q

Now switch from delta and ROOT to lhcba1 or so and source python.

python RemoveMultipleCandidates.py -all

Add variables to the MC samples

root
.L Scripts/AddVariable.cpp+
addAllVariablesAllMCSamples(1); addAllVariablesAllMCSamples(2);
.q

Get the eficiencies needed for the estimation of the best MVA response cut, defined in Efficiency.cpp. In order to get the selection efficiency, the generator level sample must be present!

root
.L Efficiency.cpp+
runAllEff();
.q

Scan the significance in the MVA cut using the code in BDTcutScanner.cpp. Don't mind the 2012 and 2016 tags, they are just dummies.

root
.L BDTcutScanner.cpp+
ScanSignalAndBckgndEstimation("2012",1,0.01,false,false,false,true)
ScanSignalAndBckgndEstimation("2016",2,0.01,false,false,false,true)
.q

Make a nice TGraph from the scan; when creating the scan, it can happen that eg an estimation at cut at 0.95 happens before a cut at 0.92. This script just takes it and makes a pretty clean plot. Do not forget to set the correct path in ReorganizeTGraph.py.

python ReorganizeTGraph.py

Now get nice plots of the maximal significance in the MVA cut using BDTcutScanner.cpp. Still don't mind the 2012 and 2016 tags, they are just dummies.

root
.L BDTcutScanner.cpp+
getMaxBDTresponse("2012",1,true,false,0,false,false)
getMaxBDTresponse("2016",2,true,false,0,false,false)
.q

Use the MVA scan to plot the signal yields, apply the MVA cut and compare the yields to the CMS results (see SignalStudy.cpp). It also creates the tuples used by the FCNC fitter tagged as BDT output selection.

root
.L SignalStudy.cpp+
plotYieldInQ2(true); plotYieldInQ2(false);
ApplyCutPerYearAll(1);  ApplyCutPerYearAll(2);
printYileds(false); printYileds(true)
yieldComparison(1,getTMVAcut(1));
yieldComparison(2,getTMVAcut(2));
.q

Checking the inclusive sample

root
.L BDTSelection.cpp+
runAllIncMC(1); runAllIncMC(2)
.q

lb-conda default python Rescale_pi0momentum.py (CAREFUL, NEEDS TO BE SET BY HAND)

root
.L MCtruthmatching.cpp+
TruthMatchAllBkg(true,1,false,false,true); TruthMatchAllBkg(true,2,false,false,true);
.q

Then, we need to add the variables to the inclusive sample. This HAS TO BE SET BY HAND in AddVariable.cpp before compilation!

root
.L AddVariable.cpp+
addAllXMuMuMass(true,true,1,true,true,false); addAllXMuMuMass(false,true,1,true,true,false); applyAllVetoKplusMuMuMass(1,true,true,false);
addAllXMuMuMass(true,true,2,true,true,false); addAllXMuMuMass(false,true,2,true,true,false); applyAllVetoKplusMuMuMass(2,true,true,false);
.q

Similarly, before applying the MacGyver DTF, the paths in Rescale_pi0momentum.py have to be set by hand!

lb-conda default python Rescale_pi0momentum.py
root
.L TMVAClassApp.cpp+
TMVAClassAppInc(1); TMVAClassAppInc(2);
.q

Also the paths in RemoveMultipleCandidates.py have to be set by hand!

python RemoveMultipleCandidates.py -all 

Lastly, make the truth-matching plots using InclusiveCheck.cpp.

root
.L Scripts/InclusiveCheck.cpp+
plotTM(1,true); plotTM(2,true)
plotTM(1,false); plotTM(2,false)
.q