Machine Learning Kurs im Rahmen der Studierendentage im SS 2023
  1. % Introduction to Data Analysis and Machine Learning in Physics: \ 3. Machine Learning Basics, Multivariate Analysis
  2. % Martino Borsato, Jörg Marks, Klaus Reygers
  3. % Studierendentage, 11-14 April 2023
  4. ## Multi-variate analyses (MVA)
  5. * General Question
  6. \vspace{0.1cm}
  7. There are 2 categories of distinguishable data, S and B,
  8. described by discrete variables. What are criteria for a separation
  9. of both samples?
  10. * Single criteria are not sufficient to distinguish S and B
  11. * Reduction of the variable space to probabilities for S or B
  12. \vspace{0.1cm}
  13. * Classification of measurements using a set of observables $(V_1,V_2,....,V_n)$
  14. * find optimal separation conditions considering correlations
  15. \begin{figure}
  16. \centering
  17. \includegraphics[width=0.8\textwidth]{figures/SandBcuts.jpeg}
  18. \end{figure}
  19. ## Multi-variate analyses (MVA)
  20. * Regression - in the multidimensional observable space $(V_1,V_2,....,V_n)$
  21. a functional connection with optimal parameters is determined
  22. \begin{figure}
  23. \centering
  24. \includegraphics[width=0.8\textwidth]{figures/regression.jpeg}
  25. \end{figure}
  26. * supervised regression: model is known
  27. * unsupervised regression: model is unknown
  28. * for the parameter determination Maximum likelihood fits are used
  29. ## MVA Classification in N Dimensions
  30. For each event there are N measured variables
  31. \begin{figure}
  32. \centering
  33. \includegraphics[width=0.9\textwidth]{figures/classificationVar.jpeg}
  34. \end{figure}
  35. :::: columns
  36. :::: {.column width=70%}
  37. * Search for a mathematical transformation F of the N dimensional
  38. input space to a one dimensional output space $F(\vec V) : \mathbb{R}^N \rightarrow \mathbb{R}$
  39. * A simple cut in F implements a complex cut in the N dimensional variable space
  40. * Determine $F(\vec V)$ using a model and fit the parameters
  41. ::::
  42. ::::{.column width=30%}
  43. \includegraphics[]{figures/response.jpeg}
  44. ::::
  45. :::
  46. ## MVA Classification in N Dimensions
  47. :::: columns
  48. :::: {.column width=60%}
  49. * Parameters \newline
  50. Important measures to quantify quality \newline \newline
  51. Efficiency: $\epsilon = \frac{N_S (F>F_0)}{N_s}$ \newline
  52. Purity: $\pi = \frac{N_S (F>F_0)}{(N_s + N_B)(F>F_0)}$
  53. ::::
  54. ::::{.column width=40%}
  55. \includegraphics[]{figures/response.jpeg}
  56. ::::
  57. :::
  58. \vspace{0.3cm}
  59. :::: columns
  60. :::: {.column width=60%}
  61. * Reciever Operations Characteristics (ROC) \newline
  62. Errors in classification \newline
  63. \includegraphics[width=0.7\textwidth]{figures/error.jpeg}
  64. ::::
  65. ::::{.column width=40%}
  66. \includegraphics[]{figures/roc.jpeg}
  67. ::::
  68. :::
  69. ## MVA Classification in N Dimensions
  70. :::: columns
  71. :::: {.column width=60%}
  72. * Interpretation of $F(\vec V)$
  73. * The distributions of \textcolor{blue}{$F(\vec V|S)$} and \textcolor{red}{$F(\vec V|S)$} are interpreted as probability density functions (PDF), \textcolor{blue}{$PDF_S(F)$} and \textcolor{blue}{$PDF_B(F)$}
  74. * For a given $F_0$ the probability for signal and background for a
  75. given $S/B$ can be determined \newline
  76. $P ( data = S | F)= \frac {\color {blue} {f_S \cdot PDF_S(F)}} { \color {red} {f_B \cdot PDF_B(F)} + \color {blue} {f_S \cdot PDF_S(F)} }$
  77. ::::
  78. ::::{.column width=40%}
  79. \includegraphics[]{figures/response.jpeg}
  80. ::::
  81. :::
  82. \vspace{0.3cm}
  83. * A cut in the one dimensional Variable $F(\vec V) =F_0$ and accepting all events on the right determines the signal and background efficiency (background rejection). A systematic change of $F(\vec V)$ gives the ROC curve. \newline
  84. \definecolor{darkgreen}{RGB}{0,125,0}
  85. * \color{darkgreen}{A cut in $F(\vec V)$ corresponds to a complex hyperplane, which can not neccessarily be described by a function.}
  86. ## Simple Cuts in Variables
  87. * The most simple classificator to select signal events are cuts in all variables which show a separation
  88. * The output is binary and not a probability on $S$ or $B$.
  89. * An optimization of the cuts is done by maximizing of the background suppression for given signal efficiencies.
  90. * Significance $sig = \epsilon_S \cdot N_S / \sqrt{ \epsilon_S \cdot N_S + \epsilon_B( \epsilon_S) N_B}$
  91. \begin{figure}
  92. \centering
  93. \includegraphics[width=0.8\textwidth]{figures/cutInVariables.jpeg}
  94. \end{figure}
  95. ## Fisher Discriminat
  96. Idea: Find a plane, that the projection of the data on the plane gives an optimal separation of signal and background
  97. :::: columns
  98. :::: {.column width=60%}
  99. * The Fisher discriminat is the linear combination of all input variables
  100. \newline
  101. $F(\vec{V}) = \sum_i w_i \cdot V_i = \vec{w}^T \vec{V}$ \newline
  102. * $\vec w$ defines the orientation of the plane. The coefficients are defined such that the difference of the expectation values of both classes is large and the variance is small. \newline
  103. $J( \vec{w} ) = \frac {( F_S - F_B )^2}{ \sigma_S^2 + \sigma_B^2 } = \frac { \vec{w}^T K \vec{w} }{ \vec{w}^T L \vec{w} }$ \newline
  104. with $K$ as covariance of the the expectation values $F_S -F_B$ and L is the sum
  105. * For the separation a value $F_c$ is determined.
  106. ::::
  107. ::::{.column width=40%}
  108. \includegraphics[]{figures/fisher.jpeg}
  109. ::::
  110. :::
  111. ## k-Nearest Neighbor Method (1)
  112. $k$-NN classifier:
  113. * Estimates probability density around the input vector
  114. * $p(\vec x|S)$ and $p(\vec x|B)$ are approximated by the number of signal and background events in the training sample that lie in a small volume around the point $\vec x$
  115. \vspace{2ex}
  116. Algorithms finds $k$ nearest neighbors:
  117. $$ k = k_s + k_b $$
  118. Probability for the event to be of signal type:
  119. $$ p_s(\vec x) = \frac{k_s(\vec x)}{k_s(\vec x) + k_b(\vec x)} $$
  120. ## k-Nearest Neighbor Method (2)
  121. ::: columns
  122. :::: {.column width=60%}
  123. Simplest choice for distance measure in feature space is the Euclidean distance:
  124. $$ R = |\vec x - \vec y|$$
  125. Better: take correlations between variables into account:
  126. $$ R = \sqrt{(\vec{x}-\vec{y})^T V^{-1} (\vec{x}-\vec{y})} $$
  127. $$ V = \text{covariance matrix}, R = \text{"Mahalanobis distance"}$$
  128. ::::
  129. :::: {.column width=40%}
  130. ![](figures/knn.png)
  131. ::::
  132. :::
  133. \vfill
  134. The $k$-NN classifier has best performance when the boundary that separates signal and background events has irregular features that cannot be easily approximated by parametric learning methods.
  135. ##
  136. * Determination of the underlying data structure (regression problem)
  137. * MVA Methods
  138. * More effective than classic cut-based analyses
  139. * Take correlations of input variables into account
  140. \vfill
  141. * Important: find good input variables for MVA methods
  142. * Good separation power between S and B
  143. * No strong correlation among variables
  144. * No correlation with the parameters you try to measure in your signal sample!
  145. \vfill
  146. * Pre-processing
  147. * Apply obvious variable transformations and let MVA method do the rest
  148. * Make use of obvious symmetries: if e.g. a particle production process is symmetric in polar angle $\theta$ use $|\cos \theta|$ and not $\cos \theta$ as input variable
  149. * It is generally useful to bring all input variables to a similar numerical range
  150. ## Fischer Discriminant
  151. ## Regression
  152. ## Logistic regression
  153. ## Decision Trees
  154. XGBoost example with the iris dataset
