Machine Learning Kurs im Rahmen der Studierendentage im SS 2023
You can not select more than 25 topics Topics must start with a letter or number, can include dashes ('-') and can be up to 35 characters long.

467 lines
70 KiB

2 years ago
2 years ago
2 years ago
  1. {
  2. "cells": [
  3. {
  4. "cell_type": "markdown",
  5. "metadata": {},
  6. "source": [
  7. "# Exercise: Classification of air showers measured with the MAGIC telescope\n",
  8. "\n",
  9. "The [MAGIC telescope](https://en.wikipedia.org/wiki/MAGIC_(telescope)) is a Cherenkov telescope situated on La Palma, one of the Canary Islands. The [MAGIC machine learning dataset](https://archive.ics.uci.edu/ml/datasets/magic+gamma+telescope) can be obtained from [UC Irvine Machine Learning Repository](https://archive.ics.uci.edu/ml/index.php).\n",
  10. "\n",
  11. "The task is to separate signal events (gamma showers) and background events (hadron showers) based on the features of a measured Cherenkov shower.\n",
  12. "\n",
  13. "The features of a shower are:\n",
  14. "\n",
  15. " 1. fLength: continuous # major axis of ellipse [mm]\n",
  16. " 2. fWidth: continuous # minor axis of ellipse [mm] \n",
  17. " 3. fSize: continuous # 10-log of sum of content of all pixels [in #phot]\n",
  18. " 4. fConc: continuous # ratio of sum of two highest pixels over fSize [ratio]\n",
  19. " 5. fConc1: continuous # ratio of highest pixel over fSize [ratio]\n",
  20. " 6. fAsym: continuous # distance from highest pixel to center, projected onto major axis [mm]\n",
  21. " 7. fM3Long: continuous # 3rd root of third moment along major axis [mm] \n",
  22. " 8. fM3Trans: continuous # 3rd root of third moment along minor axis [mm]\n",
  23. " 9. fAlpha: continuous # angle of major axis with vector to origin [deg]\n",
  24. " 10. fDist: continuous # distance from origin to center of ellipse [mm]\n",
  25. " 11. class: g,h # gamma (signal), hadron (background)\n",
  26. "\n",
  27. "g = gamma (signal): 12332\n",
  28. "h = hadron (background): 6688\n",
  29. "\n",
  30. "For technical reasons, the number of h events is underestimated.\n",
  31. "In the real data, the h class represents the majority of the events.\n",
  32. "\n",
  33. "You can find further information about the MAGIC telescope and the data discrimination studies in the following [paper](https://reader.elsevier.com/reader/sd/pii/S0168900203025051?token=8A02764E2448BDC5E4DD0ED53A301295162A6E9C8F223378E8CF80B187DBFD98BD3B642AB83886944002206EB1688FF4) (R. K. Bock et al., \"Methods for multidimensional event classification: a case studyusing images from a Cherenkov gamma-ray telescope\" NIM A 516 (2004) 511-528) (You need to be within the university network to get free access.) "
  34. ]
  35. },
  36. {
  37. "cell_type": "code",
  38. "execution_count": 2,
  39. "metadata": {},
  40. "outputs": [],
  41. "source": [
  42. "import pandas as pd\n",
  43. "import numpy as np\n",
  44. "from sklearn.model_selection import train_test_split"
  45. ]
  46. },
  47. {
  48. "cell_type": "code",
  49. "execution_count": 3,
  50. "metadata": {},
  51. "outputs": [],
  52. "source": [
  53. "filename = \"https://www.physi.uni-heidelberg.de/~reygers/lectures/2021/ml/data/magic04_data.txt\"\n",
  54. "df = pd.read_csv(filename, engine='python')"
  55. ]
  56. },
  57. {
  58. "cell_type": "code",
  59. "execution_count": 4,
  60. "metadata": {},
  61. "outputs": [],
  62. "source": [
  63. "# use categories 1 and 0 insted of \"g\" and \"h\"\n",
  64. "df['class'] = df['class'].map({'g': 1, 'h': 0})"
  65. ]
  66. },
  67. {
  68. "cell_type": "code",
  69. "execution_count": 5,
  70. "metadata": {},
  71. "outputs": [
  72. {
  73. "data": {
  74. "text/html": [
  75. "<div>\n",
  76. "<style scoped>\n",
  77. " .dataframe tbody tr th:only-of-type {\n",
  78. " vertical-align: middle;\n",
  79. " }\n",
  80. "\n",
  81. " .dataframe tbody tr th {\n",
  82. " vertical-align: top;\n",
  83. " }\n",
  84. "\n",
  85. " .dataframe thead th {\n",
  86. " text-align: right;\n",
  87. " }\n",
  88. "</style>\n",
  89. "<table border=\"1\" class=\"dataframe\">\n",
  90. " <thead>\n",
  91. " <tr style=\"text-align: right;\">\n",
  92. " <th></th>\n",
  93. " <th>fLength</th>\n",
  94. " <th>fWidth</th>\n",
  95. " <th>fSize</th>\n",
  96. " <th>fConc</th>\n",
  97. " <th>fConc1</th>\n",
  98. " <th>fAsym</th>\n",
  99. " <th>fM3Long</th>\n",
  100. " <th>fM3Trans</th>\n",
  101. " <th>fAlpha</th>\n",
  102. " <th>fDist</th>\n",
  103. " <th>class</th>\n",
  104. " </tr>\n",
  105. " </thead>\n",
  106. " <tbody>\n",
  107. " <tr>\n",
  108. " <th>0</th>\n",
  109. " <td>28.7967</td>\n",
  110. " <td>16.0021</td>\n",
  111. " <td>2.6449</td>\n",
  112. " <td>0.3918</td>\n",
  113. " <td>0.1982</td>\n",
  114. " <td>27.7004</td>\n",
  115. " <td>22.0110</td>\n",
  116. " <td>-8.2027</td>\n",
  117. " <td>40.0920</td>\n",
  118. " <td>81.8828</td>\n",
  119. " <td>1</td>\n",
  120. " </tr>\n",
  121. " <tr>\n",
  122. " <th>1</th>\n",
  123. " <td>31.6036</td>\n",
  124. " <td>11.7235</td>\n",
  125. " <td>2.5185</td>\n",
  126. " <td>0.5303</td>\n",
  127. " <td>0.3773</td>\n",
  128. " <td>26.2722</td>\n",
  129. " <td>23.8238</td>\n",
  130. " <td>-9.9574</td>\n",
  131. " <td>6.3609</td>\n",
  132. " <td>205.2610</td>\n",
  133. " <td>1</td>\n",
  134. " </tr>\n",
  135. " <tr>\n",
  136. " <th>2</th>\n",
  137. " <td>162.0520</td>\n",
  138. " <td>136.0310</td>\n",
  139. " <td>4.0612</td>\n",
  140. " <td>0.0374</td>\n",
  141. " <td>0.0187</td>\n",
  142. " <td>116.7410</td>\n",
  143. " <td>-64.8580</td>\n",
  144. " <td>-45.2160</td>\n",
  145. " <td>76.9600</td>\n",
  146. " <td>256.7880</td>\n",
  147. " <td>1</td>\n",
  148. " </tr>\n",
  149. " <tr>\n",
  150. " <th>3</th>\n",
  151. " <td>23.8172</td>\n",
  152. " <td>9.5728</td>\n",
  153. " <td>2.3385</td>\n",
  154. " <td>0.6147</td>\n",
  155. " <td>0.3922</td>\n",
  156. " <td>27.2107</td>\n",
  157. " <td>-6.4633</td>\n",
  158. " <td>-7.1513</td>\n",
  159. " <td>10.4490</td>\n",
  160. " <td>116.7370</td>\n",
  161. " <td>1</td>\n",
  162. " </tr>\n",
  163. " <tr>\n",
  164. " <th>4</th>\n",
  165. " <td>75.1362</td>\n",
  166. " <td>30.9205</td>\n",
  167. " <td>3.1611</td>\n",
  168. " <td>0.3168</td>\n",
  169. " <td>0.1832</td>\n",
  170. " <td>-5.5277</td>\n",
  171. " <td>28.5525</td>\n",
  172. " <td>21.8393</td>\n",
  173. " <td>4.6480</td>\n",
  174. " <td>356.4620</td>\n",
  175. " <td>1</td>\n",
  176. " </tr>\n",
  177. " </tbody>\n",
  178. "</table>\n",
  179. "</div>"
  180. ],
  181. "text/plain": [
  182. " fLength fWidth fSize fConc fConc1 fAsym fM3Long fM3Trans \\\n",
  183. "0 28.7967 16.0021 2.6449 0.3918 0.1982 27.7004 22.0110 -8.2027 \n",
  184. "1 31.6036 11.7235 2.5185 0.5303 0.3773 26.2722 23.8238 -9.9574 \n",
  185. "2 162.0520 136.0310 4.0612 0.0374 0.0187 116.7410 -64.8580 -45.2160 \n",
  186. "3 23.8172 9.5728 2.3385 0.6147 0.3922 27.2107 -6.4633 -7.1513 \n",
  187. "4 75.1362 30.9205 3.1611 0.3168 0.1832 -5.5277 28.5525 21.8393 \n",
  188. "\n",
  189. " fAlpha fDist class \n",
  190. "0 40.0920 81.8828 1 \n",
  191. "1 6.3609 205.2610 1 \n",
  192. "2 76.9600 256.7880 1 \n",
  193. "3 10.4490 116.7370 1 \n",
  194. "4 4.6480 356.4620 1 "
  195. ]
  196. },
  197. "execution_count": 5,
  198. "metadata": {},
  199. "output_type": "execute_result"
  200. }
  201. ],
  202. "source": [
  203. "df.head()"
  204. ]
  205. },
  206. {
  207. "cell_type": "markdown",
  208. "metadata": {},
  209. "source": [
  210. "#### a) Create for each variable a figure with a plot for gammas and hadrons overlayed."
  211. ]
  212. },
  213. {
  214. "cell_type": "code",
  215. "execution_count": 6,
  216. "metadata": {},
  217. "outputs": [],
  218. "source": [
  219. "import matplotlib.pyplot as plt"
  220. ]
  221. },
  222. {
  223. "cell_type": "code",
  224. "execution_count": 46,
  225. "metadata": {},
  226. "outputs": [],
  227. "source": [
  228. "df0 = df[df['class'] == 0] # hadron data set\n",
  229. "df1 = df[df['class'] == 1] # gamma data set"
  230. ]
  231. },
  232. {
  233. "cell_type": "code",
  234. "execution_count": 45,
  235. "metadata": {},
  236. "outputs": [
  237. {
  238. "data": {
  239. "text/plain": [
  240. "<matplotlib.legend.Legend at 0x122031130>"
  241. ]
  242. },
  243. "execution_count": 45,
  244. "metadata": {},
  245. "output_type": "execute_result"
  246. },
  247. {
  248. "data": {
  249. "image/png": "iVBORw0KGgoAAAANSUhEUgAAA3YAAAE9CAYAAABHrfALAAAAOXRFWHRTb2Z0d2FyZQBNYXRwbG90bGliIHZlcnNpb24zLjMuMCwgaHR0cHM6Ly9tYXRwbG90bGliLm9yZy86wFpkAAAACXBIWXMAAAsTAAALEwEAmpwYAABY40lEQVR4nO3dfbwcZX3//9ebAIkCBgmaYhJMEKgFbwJECNVq0IJoLaEVJVgrttS0VWrVL7XwtUVK9QtYa4qFbzUFKuANWNR+I4I0Ss7Pqgkm3N9EYgwIQWwkoZETTCTw+f0x1x7mbPacs+fszczsvp+Px3mc2Wuunf3M7rWzc811zXUpIjAzMzMzM7Pq2q3oAMzMzMzMzKw1rtiZmZmZmZlVnCt2ZmZmZmZmFeeKnZmZmZmZWcW5YmdmZmZmZlZxrtiZmZmZmZlV3O5FBzAe+++/f8yePXvo8bZt29hrr72KC6iBMsYE5YyrnTHdeuutj0XEC8bzHEknAhcDk4DLIuLCuvWTgauAo4DNwKkR8WBa9wrgs8DzgGeAV0XE9pFeq77s1pTxcxkPx9+6iZTdbhup/NaU4X3Mczxja1dMZS+/ZT5vcCyNdSuWspddqM65g+MZW7tjGrH8RkRl/o466qjIW7FiRZRNGWOKKGdc7YwJWBPjKEtklbkfAwcBewJ3AofV5Xkv8Jm0vAi4Ni3vDtwFvDI9ngZMGu316stuJ96DIjj+1o237BbxN1L5rSnD+5jneMbWrpjKXn7LfN7gWBrrVixlL7tRoXMHxzO2dsc0Uvl1V0zrV0cD6yNiQ0T8CrgGWFiXZyFwZVq+DniDJAEnAHdFxJ0AEbE5Ip7uUtxmZmZmZrtwxc761Qzg4dzjjSmtYZ6I2AlsJWudOxQISTdJuk3Sh7sQr5mZmZnZiCp1j51ZSewOvAZ4FfAk8G1Jt0bEt/OZJC0GFgNMnz6dgYGBXTY0ODjYML0qHL+ZmZlZObRUsZvo4BOSjgaW1rIB50XE11qJxVrz1FNPsXHjRrZvH3H8j7abOnUqa9euHddzpkyZwsyZM9ljjz1afflHgFm5xzNTWqM8GyXtDkwlK8cbge9ExGMAkm4AjgSGVewiYimpnM+bNy8WLFiwSxADAwM0Sq8Kx29mZmZWDhOu2EmaBFwKHE92orta0rKIuC+X7Qzg8Yg4WNIi4CLgVOAeYF5E7JR0AHCnpK+n7m7tseKC7P9x57Rtk71s48aN7LPPPsyePZvsNrLOe+KJJ9hnn32azh8RbN68mY0bNzJnzpxWX341cIikOWQVuEXAO+ryLANOB1YCpwA3R0RIugn4sKTnAr8CXgcsaTWgIS67ZgAsWb5uaPmDxx9aYCRmXeBjvxWlVvbA5a/iWrnHbsKDT0TEk7lK3BQgWojD2mD79u1Mmzata5W6iZDEtGnT2tKqmMrfmcBNwFrgyxFxr6TzJZ2Usl0OTJO0HvgQcHZ67uPAp8gqh3cAt0XEN1oOyszMzMxsglrpitlo8IljRsqTWudqg088JukY4ArgxcAftrW1ziakzJW6mnbGGBE3ADfUpZ2bW94OvG2E534e+HzbgjEzMzMza0Fhg6dExC3A4ZJ+A7hS0o3RYILn0QagGHXgg8HUVa/LAyOUdTCGseKaOnUqTzzxRPcCAp5++ulhr/mTn/yEt7/97dxyyy2jPm/79u2lfI/NzMzGJd8FzsysRa1U7FoZfGJIRKyVNAi8DFhT/yKjDUAx6sAHtYPlgkXN7k9blHUwhrHiWrt27bD73fL3trRDo/tj6u+x23vvvdltt93GvO9uypQpHHHEEW2Nz6xKJF0BvAXYFBEva7BeZANbvZls5NZ3R8Rtad3pwN+krB+LiCvrn1+0dh9/zMysBb7/szJaucduaPAJSXuSDT6xrC5PbfAJGD74xJxU0UPSi4GXAg+2EIv1iKeffpr3vOc9HH744Zxwwgn88pe/LDokszL6HHDiKOvfBByS/hYD/wIgaT/go2Td5o8GPirp+R2NtE2WLF/nCp+ZmdkoJlyxa2XwCbI5wO6UdAfwNeC9taHjrb/96Ec/4n3vex/33nsv++67L1/5yleKDsmsdCLiO8CWUbIsBK6KzCpg3zQC8RuB5RGxJQ0CtJzRK4hmZtZGkiZJul3S9UXHYr2npXvsJjr4RERcDVzdymtbb5ozZw5z584F4KijjuLBBx8sNB6zimo0uNWMUdLNzKw7/pKsQeR5RQcyKt//WUmFDZ5i1sjkyZOHlidNmtTfXTHdp90KNNrAVfXaMWjUpid2DC2PVtMcGPjpmNsq2yBWZYsHyhlTz2p0guzjel+SNBP4HeDjZD3ZzNrKFTszs94z0uBWjwAL6tIHGm1gtIGr6rVj0Khm7597+4KxJyov2yBWZYsHyhlTz3GLh+3qn4APA6OPElckl9tKc8XOzKz3LAPOlHQN2UApWyPiUUk3Af8nN2DKCYCbDszKJH9i7Za9niGpNpLxrZIWjJJvzN4SbW9xr00RNpYRXrNsPQDKFg90LyZX7KyhRtMTdNrs2bO55557hh6fddZZXY/BrAokfYms5W1/SRvJRrrcAyAiPkN27/ObgfVk0x38UVq3RdLfk41qDHB+RIw2CIuZmbXHq4GTJL0ZmAI8T9LnI+Kd+UzN9JZoe4t7s610I0whVrYeAGWLB7oXkyt2ZmYVExGnjbE+gPeNsO4K4IpOxDUhQycUby00DLOuWnFB1kqyd9GBWLdExDmkHhKpxe6s+kqdWatamcfOzMysY+Y/tJT5Dy0tOgwzM7NK6P0WuxUXuI+6mZmZVY9HR+5JETHACANXlZrv/yy93q/YmZmZmZnZcB4Bs+e4K6aZmRXO3S7NzMxa4xY7MzOrjPx8d0WM3mtmZlZWrtiZmVlpuNXOzMxsYlyxs8ba3e/aN9k2beWGzUPLxx40rcBIzMzKQ9Is4CpgOhDA0oi4uNiozMzKwxU7K5W///u/5/Of/zwveMELmDVrFkcddZQnKjczM4CdwP+KiNsk7QPcKml5RNxXdGAd59EIrWw8YmspuWJnpbF69Wq+8pWvcOedd/LUU09x5JFHctRRRxUdlpmZlUBEPAo8mpafkLQWmAH0fsXOzKwJrthZaXzve99j4cKFTJkyhSlTpvC7v/u7RYdkZmYlJGk2cARwS8GhFMeteGZWxxU7MzMzqwxJewNfAT4QEb9osH4xsBhg+vTpDAwMDK0bHBwc9rjtnvhZ7sGcUbMOPjOZgcFd82zbsXNoea/JDU7TavHnn9viPnX8fRmHMsViTVhxwfCyaIVyxc5K49WvfjV/+qd/yjnnnMPOnTu5/vrrWbx4cdFhmVknjGOApvqRMlcd6ONCv5K0B1ml7gsR8dVGeSJiKbAUYN68ebFgwYKhdQMDA+Qft904yvXA4BwW7P3ALukrN40xgNaCRbu+Vi1tgjr+voxDmWIxqxpX7Kw0XvWqV3HSSSfxile8gunTp/Pyl7+cqVOnFh1W8VZc4G42Ztb3JAm4HFgbEZ8qOh6zymr3yOdWGq7YWWMFVSTOOusszjvvPJ588kle+9rXevAUMzOreTXwh8Ddku5Iaf87Im4oLqQCNDgpX7J8HQAfPP7QbkdjZiXiip2VyuLFi7nvvvvYvn07p59+OkceeWTRIZmZWQlExHcBFR2HWT/Iz6lb47l1y88VOyuVL37xi0WHYGadVNfa0OjkwczMuqt2LJ5w5c2jtJbCbkUHYGZmZmZmZq1xi50NiQiye9PLKyKKDsGscJJOBC4GJgGXRcSFdeuXAMelh88FXhgR+6Z1TwN3p3UPRcRJXQm6A3xfkZVNvgW6UctHfv22qbNYuWnzUL5WWq+HRo5dkV7TLSZmfckVOwNgypQpbN68mWnTppW2chcRbN68mSlTphQdillhJE0CLgWOBzYCqyUti4j7anki4oO5/H9BNpFzzS8jYm6XwjWzMTRToRurwmjWjKGLYensv51d4Vvuymlt4YqdATBz5kw2btzIz3/+86695vbt28ddSZsyZQozZ85sy+s30eoxGbgKOArYDJwaEQ/m1h8I3AecFxGfbEtQZmM7GlgfERsAJF0
  250. "text/plain": [
  251. "<Figure size 1080x360 with 10 Axes>"
  252. ]
  253. },
  254. "metadata": {
  255. "needs_background": "light"
  256. },
  257. "output_type": "display_data"
  258. }
  259. ],
  260. "source": [
  261. "fig, axs = plt.subplots(nrows=2, ncols=5, figsize=(15,5))\n",
  262. "plt.subplots_adjust(hspace = 0.3, wspace=0.3)\n",
  263. "\n",
  264. "for i in range(10):\n",
  265. " kx = i // 5\n",
  266. " ky = i % 5\n",
  267. " axs[kx, ky].set_xlabel(df0.columns[i])\n",
  268. " df0.iloc[:,i].hist(ax = axs[kx, ky], bins = 50, alpha=0.5, density=True, label='h')\n",
  269. " df1.iloc[:,i].hist(ax = axs[kx, ky], bins = 50, alpha=0.5, density=True, label='g')\n",
  270. "\n",
  271. "axs[0, 0].legend()"
  272. ]
  273. },
  274. {
  275. "cell_type": "markdown",
  276. "metadata": {},
  277. "source": [
  278. "#### b) Create training and test data set. The tast data should amount to 50\\% of the total data set."
  279. ]
  280. },
  281. {
  282. "cell_type": "code",
  283. "execution_count": 21,
  284. "metadata": {},
  285. "outputs": [],
  286. "source": [
  287. "y = df['class'].values\n",
  288. "X = df[[col for col in df.columns if col!=\"class\"]]\n",
  289. "\n",
  290. "### YOUR CODE ### \n",
  291. "\n",
  292. "X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.5, shuffle=True)"
  293. ]
  294. },
  295. {
  296. "cell_type": "markdown",
  297. "metadata": {},
  298. "source": [
  299. "#### c) Define the logistic regressor and fit the training data"
  300. ]
  301. },
  302. {
  303. "cell_type": "code",
  304. "execution_count": 15,
  305. "metadata": {},
  306. "outputs": [],
  307. "source": [
  308. "from sklearn import linear_model\n",
  309. "\n",
  310. "# define logistic regressor\n",
  311. "\n",
  312. "### YOUR CODE ###\n",
  313. "\n",
  314. "logreg=linear_model.LogisticRegression(fit_intercept=True,\n",
  315. " penalty='none',\n",
  316. " max_iter=1000,\n",
  317. " tol=1E-5)\n",
  318. "\n",
  319. "# fit training data\n",
  320. "\n",
  321. "### YOUR CODE ###\n",
  322. "\n",
  323. "import time\n",
  324. "start_time = time.time()\n",
  325. "logreg.fit(X_train, y_train)\n",
  326. "run_time = time.time() - start_time"
  327. ]
  328. },
  329. {
  330. "cell_type": "code",
  331. "execution_count": 16,
  332. "metadata": {},
  333. "outputs": [],
  334. "source": [
  335. "y_pred = logreg.predict(X_test)"
  336. ]
  337. },
  338. {
  339. "cell_type": "markdown",
  340. "metadata": {},
  341. "source": [
  342. "#### d) Determine the Model Accuracy, the AUC score and the Run time"
  343. ]
  344. },
  345. {
  346. "cell_type": "code",
  347. "execution_count": 17,
  348. "metadata": {},
  349. "outputs": [
  350. {
  351. "name": "stdout",
  352. "output_type": "stream",
  353. "text": [
  354. "Model Accuracy: 78.99%\n",
  355. "AUC score: 0.74\n",
  356. "Run time: 0.22 sec\n",
  357. "\n",
  358. "\n"
  359. ]
  360. }
  361. ],
  362. "source": [
  363. "from sklearn.metrics import roc_auc_score\n",
  364. "\n",
  365. "### YOUR CODE ###\n",
  366. "\n",
  367. "print(\"Model Accuracy: {:.2f}%\".format(100*logreg.score(X_test, y_test)))\n",
  368. "print(\"AUC score: {:.2f}\".format(roc_auc_score(y_test,y_pred)))\n",
  369. "print(\"Run time: {:.2f} sec\\n\\n\".format(run_time))"
  370. ]
  371. },
  372. {
  373. "cell_type": "markdown",
  374. "metadata": {},
  375. "source": [
  376. "#### e) Plot the ROC curve (Backgropund Rejection vs signal efficiency)"
  377. ]
  378. },
  379. {
  380. "cell_type": "code",
  381. "execution_count": 18,
  382. "metadata": {},
  383. "outputs": [
  384. {
  385. "data": {
  386. "image/png": "iVBORw0KGgoAAAANSUhEUgAAAYIAAAEGCAYAAABo25JHAAAAOXRFWHRTb2Z0d2FyZQBNYXRwbG90bGliIHZlcnNpb24zLjMuMCwgaHR0cHM6Ly9tYXRwbG90bGliLm9yZy86wFpkAAAACXBIWXMAAAsTAAALEwEAmpwYAAAowElEQVR4nO3deXxV9Z3/8dfnZoUkhCUBkTWsilZcouIOddcW2mpbrbS1taXTamfsNuP8dMbW7na62dpatdbWad26OFAVqAuiVpRYFgFlkX0POyF78vn9cQ40QJYbkpN7b+77+XicB/ec873nfI5IPvme72bujoiIpK9YogMQEZHEUiIQEUlzSgQiImlOiUBEJM0pEYiIpLnMRAfQXkVFRT58+PBEhyEiklLefPPNHe5e3Ny5lEsEw4cPp6ysLNFhiIikFDNb19I5vRoSEUlzSgQiImlOiUBEJM0pEYiIpDklAhGRNBdZIjCzh8xsu5ktaeG8mdk9ZrbKzBab2elRxSIiIi2LskbwMHBFK+evBEaH2zTglxHGIiIiLYhsHIG7zzWz4a0UmQL8zoN5sOeZWW8zG+juW6KIp2ztLuau3EGPrAyO751LbX0jg3r3ICcrxvB+efTLz4nitiIiSS+RA8oGARua7G8Mjx2VCMxsGkGtgaFDhx7Tzd5ct5t7nl/Z4vn8nEwO1NYzsjif0mF9iMWMDDMyYkZNfSNjBuQzrF9PDGNI354UF+TQIyuD7Ew1s4hIakuJkcXufj9wP0BpaekxraTzuYtGcsOEYew+UEttQyPb99XgOKvLD7BhVyXz1uxi1bb9VFTX8+Ly7TQ0Og2Nzu7KuhavGTPIzoxxyqDe3DBhKAMLe9CnZxZ5OZkU5GaSk6lEISLJL5GJYBMwpMn+4PBYZPJzMsnPCR55ZHE+AOeOLGrze+X7a9i2r5r6RmfDrko27K4kOyPGG2t28damvbyxdhdvrN3V4vd75WZSOrwv2RkxLhxTTG19A8P65dG/Vw79C3Ipys/GzDrnIUVE2imRiWA6cIuZPQacDeyNqn2go4oLciguCNoQTh3S+9Dxz1wwAggSxVub9lBb75Tvr6amvpFGd5Zt3kd5RQ219Y288M52AGYu3XrU9fOyM+iRnUFeTiaj+xdQ2COLipo6BvfpyZgB+eRkZjCgVy5987LJzAheWfXNzyY7I0Z2RoxYTElERI5dZInAzB4FJgJFZrYRuBPIAnD3+4BngKuAVUAl8KmoYolacUEO7z1hQJvl9lbWUV3fwLZ91WzcXUX5/hr2VdWxbMs+3tm6n+q6BlZu38+6nZXtun9OZowJI/qRn5tJ/4Icxg4oYERxPkX52RTkZpGfk0lWhpGZoddUInK0KHsNXd/GeQdujur+yaiwZxaFZDGgVy6nDO7dZvldB2rZUVHDpt1V1NQ30tDo1Dc2sm5nJT2zM9i6t5ote6vZsreKl1aUt3m9mMGHTh9MozvD++UxojiPzFiMhkbn+N659MjOYERRvto1RNJMSjQWp6u+edn0zctmzICCuMq7Oxt3V7FqewVVdQ1s3lOFOzS488c3N1JT38Af39wY9/0nji1mWN+e5OdmUpSfQ2ZGjKK8bIaFSSQ3K+NYH01EkogFv5injtLSUtd6BB3T0Ois31VJXUMjNXWNlFdUYxjb9lVTtm43L60oZ0dFDTmZMarrGlu8zpC+PQAYN7AXZ5X04/KTBjC4T8+uegwRaQcze9PdS5s9p0QgralvaKSqroF91fXsraxj854qFm7Yw6Y9VSzdvJcV2yqO+k7psD70yctmRFEeWRkxhvXrSVZGjEZ3Bhb2YECvHIb0DY6JSNdQIpDINDQ6W/ZW8ea63Tz39nYWbdjD+l1BG0ZlbUOr3y3Kz6FXbibXnDGYkcV5nHR8IUP6qkYhEgUlAkmIxkZn+/4a6hoaqaxtYE9lLRt3V7F6RwXrd1Uxe+lWauqbf/X071eMZURRHmeV9KNvXnYXRy7S/bSWCNRYLJGJxYzjCnMPO3b2EWWq6xpYtb2Cldv3M3vpNp5dEoyzuHvm8sPKnTm8DwW5WRT2yGLi2GIaGp2SojyKC3LIzoxRlJej8RQix0g1Akk6jY3Omp0HeHrxFp55aws5WRnU1DXwztb9bX63V24mOVkZnDakNx8uHcJJx/eib162ejhJ2tOrIekWGhudTXuqOFBbz/Kt+4NxFQ3Ou+UVxGLG+p2VzF1Zzv7q+ma/P6p/PnnZGdTUN3LdmUO44uSBR9VYRLorJQJJO6u2V7Bwwx72VdUxY/Fm+uVls2JbBet3HT1qe2jfntTWNzL2uALGDy5k6jnD6F+gBCHdixKBSBNVtQ089/Y2Zi/bRm5mjDfX72btjgM0HvFPobBHFmeV9KWuoZFLThzAOSP7MaIoTxMESkpSIhCJQ3VdA4/P38Cyzft4bfVOCnIzWbp531Hlxg8u5PMTR3HOiH4U9sxKQKQi7adEINIBtfWNzFm+nSfKNvLc29uOOn/XlJOYevYw9VqSpKZEINKJNu+p4o01u7jvpXcP9WTKjBl98rIZP7iQi08cwOTxx5OXo97ZkjyUCEQisreqjifmb+D3r69jbTPTh59V0pf3DCpk0tj+nDCwgCKtjS0JokQg0kW2769m1tJtLFi/m90Hanlx+eHTg2dnxKhtaORjZw/lg6cN4vShfcjQKyXpAkoEIgnS0Oj8Y/1uVpdXsHZnJS++s/2ogXEfKR3MCcf1YuLYYkaES6iKdDYlApEks35nJQ+8vJpZS7eyfX/NoePFBTlcc/pgLh3XnxOO66V2Buk0SgQiSWxfdR0Pzl3N0s37eD5c2/qgCSP6cteUkxlVnK9eSdIhSgQiKWL3gVoWbtzDjEWbmbtiBzsqag473ys3k7NK+jFxbDEXjC5iWL+8BEUqqUaJQCRFLdqwh78u3kxVXQMzl2xl54Famv6Tzc6IcdKgXlxy4gA+c0EJOZmaXE+ap0Qg0o3sOlDLUws28fj8DVTVNRw2f1JBTibTLhzB9WcPpV9etqbDkEOUCES6sb2VdUxfvJlnFm/htdU7Dx3Pzoxx4nEFDOnbkwvHFHPVewaSr8bntKVEIJIm9lbV8cc3N7J2xwFefXcHq8sPHFXmxnOH86VLxmiepDSjRCCSxlaXV/DIvHX8ft56ahsOXxp0/JDenD+qH1+6ZAyZGbEERShdQYlARACoqKnnz//YyJJNe1mwfg8rt1ccOve5C0fwhUmjKOyhmkJ3pEQgIs2qb2jkjqeW8Nj8DYcdP2VwIacP7cMdV5+omkI3oUQgIq2qqW/gqQWb+Pu7O6msbeBvy/453fbFJ/Tnw6VDmDi2WGs/pzAlAhFpl10Harnn+ZU89/Y2Nu6uOnT8PYMK+eBpg7j85OMY1LtHAiOU9lIiEJFjdqCmnhmLNvP9me+wu7LusHM/+eipTDn1eI1XSAFKBCLSKapqG3jmrS185clFhx0f3KcH9009g3EDe2lOpCSlRCAinW5HRQ0/fW4lj8xbd9jxEcV5zLr1QrLUyJxUWksEkf5NmdkVZrbczFaZ2W3NnB9qZi+a2QIzW2xmV0UZj4h0nqL8HL75gZNZ+72reeU/JvG5i0ZgBqvLDzD69mcpW7sr0SFKnCKrEZhZBrACuBTYCMwHrnf3ZU3K3A8scPdfmtk44Bl3H97adVUjEElu478xm71VQVvCwMJczhzel59ed6raERIsUTWCs4BV7r7a3WuBx4ApR5RxoFf4uRDYHGE8ItIFFt15GU987hzOG9WPLXurmb5oMxO++zzz1+6ioTG1XkWniyhrBNcCV7j7Z8L9jwNnu/stTcoMBGYDfYA84BJ3f7OZa00DpgEMHTr0jHXr1h1ZRESS0K4DtXzukTLmr9196NiI4jz+35Uncsm4AQmMLP0krI0gDtcDD7v7YOAq4BEzOyomd7/f3UvdvbS4uLjLgxSRY9M3L5sn/+VcnvvyRXy0dAgQtCF85ndlXPXTl9m2rzrBEQpEmwg2AUOa7A8OjzV1E/AEgLu/BuQCRRHGJCIJMKp/Pt+/9hTWfu9q/n7beynKz2HZln2c/Z3n+cC9r/Lauzvbvoh
  387. "text/plain": [
  388. "<Figure size 432x288 with 1 Axes>"
  389. ]
  390. },
  391. "metadata": {
  392. "needs_background": "light"
  393. },
  394. "output_type": "display_data"
  395. }
  396. ],
  397. "source": [
  398. "import matplotlib.pyplot as plt\n",
  399. "from sklearn.metrics import roc_curve\n",
  400. "%matplotlib inline\n",
  401. "\n",
  402. "y_pred_prob = logreg.predict_proba(X_test) # predicted probabilities\n",
  403. "\n",
  404. "### YOUR CODE ###\n",
  405. "\n",
  406. "fpr, tpr, _ = roc_curve(y_test, y_pred_prob[:,1])\n",
  407. "plt.plot(tpr, 1-fpr)\n",
  408. "plt.xlabel('Signal efficiency')\n",
  409. "plt.ylabel('Background Rejection');"
  410. ]
  411. },
  412. {
  413. "cell_type": "markdown",
  414. "metadata": {},
  415. "source": [
  416. "#### f) Plot the Signal efficiency vs. the Background efficiency and compare it to the corresponding plot in the paper"
  417. ]
  418. },
  419. {
  420. "cell_type": "code",
  421. "execution_count": 19,
  422. "metadata": {},
  423. "outputs": [
  424. {
  425. "data": {
  426. "image/png": "iVBORw0KGgoAAAANSUhEUgAAAYIAAAEKCAYAAAAfGVI8AAAAOXRFWHRTb2Z0d2FyZQBNYXRwbG90bGliIHZlcnNpb24zLjMuMCwgaHR0cHM6Ly9tYXRwbG90bGliLm9yZy86wFpkAAAACXBIWXMAAAsTAAALEwEAmpwYAAAmgUlEQVR4nO3deXwV5dn/8c+VQNhBIYDIroCIiqgRVHDDpWoV2lorqLUulWoLbtXf06ePWmurVevTp7WuuNS1UrcqrRRUFHEBS1grmyCgxIVN9hCyXb8/ZojHmOUEMplzcr7v18tXzixnzjdTeq7cc8/ct7k7IiKSubLiDiAiIvFSIRARyXAqBCIiGU6FQEQkw6kQiIhkOBUCEZEM1yTuAHWVm5vrvXr1ijuGiEhamT179np371jVtrQrBL169SI/Pz/uGCIiacXMPq5umy4NiYhkOBUCEZEMp0IgIpLhVAhERDJcZIXAzB41s7Vm9kE1283M7jaz5Wa2wMwOjyqLiIhUL8oWwWPAaTVsPx3oG/43Brg/wiwiIlKNyG4fdffpZtarhl1GAk94MA72TDPby8y6uPvnUWUSEUkHxaXlrFy/na1FJRSXlVNcWk5JmdN/nzZ0b9+y3j8vzucIugKrE5YLwnXfKARmNoag1UCPHj0aJJyISEP44NPNPPPvT3h/5ZeUlTsr12+vdt/ffudgLjiqZ71nSIsHytx9PDAeIC8vTzPpiEjaW7V+OzdNXMj0D9cBsF/HVnRp15xDu7WjRU4Tuu3dgh7tW9KxTTOaZmeRk51F171bRJIlzkLwKdA9YblbuE5EpFF7Y8kaLnksGCHh24d04aazBtC5bfPY8sRZCCYCY81sAjAE2Kz+ARFp7F6e9ylXTZhHuxZNeerSIRzSrV3ckaIrBGb2DHACkGtmBcCvgKYA7v4AMAk4A1gOFAIXR5VFRCRu7s6dU5Zy/7SPAHjxp8ewf8fWMacKRHnX0Ohatjvws6g+X0QkVRRsLOTMP7/DpsIShvbpwP0XHEHb5k3jjlUhLTqLRUTS1bI1W7n4sVlsKizhh0f15NcjDiIry+KO9TUqBCIiEXk2fzX/7/kFAFx1Ul+uOaVfzImqpkIgIlLPCotL+dnTc3hzaXBr6IM/PIJvHbRPzKmqp0IgIlKP/r3ySy59bBZbd5YytE8HHrjgCNqkUH9AVVQIRETqydTFa/jp03PYWVrO/55zKGcf0S3uSElRIRARqQfvLl/PpY8HD4lNvvpY+u/TNuZEyVMhEBHZQ8vXbuX8h98nJzuLKdccR+/cVnFHqhNNTCMisgfeWLKG7933Hq1ysnlmzFFpVwRALQIRkd2yfWcplz2Rz3sfbSC3dTMevSiPgd32ijvWblEhEBGpA3fnHws+5xcvLKCwuIwzDtmHO84emPJ3BtVEhUBEJEkfrtnKqf83HQCz6OYHaGgqBCIiSZi3ehPnPzST7Czjh0f15NpT+6XUeEF7QoVARKQWX2wu4rywCLz8s6Ec3DX+oaPrkwqBiEgNlq/dynkPvU9hcRnPXHZUoysCoNtHRUSqtaO4jLF/ncvarTt59KI8jt6/Q9yRIqEWgYhIFdZuKWLwbVOBYOTQ4f07x5woOmoRiIhUUlRSxjkPzgBg3PA+KTt8dH1RIRARSVBaVs4PH3mfjzcUMvbEPvz81APijhQ5FQIRkQTjnpnLrFUbObLX3lz3rcZfBEB9BCIiAGwpKuGaCfOYumQtw/t34uEL8+KO1GBUCEQk423bWcrRt01le3EZpwzozH3nH55y8wpHSYVARDLaxu3FHPW7qewsLec7g/blj6MOiztSg1MhEJGMNvqhmRSXlXPjmQO4dFjvuOPEQp3FIpKx/j63gCVfbOVHR/fK2CIAKgQikqHWbinit/9cDMBPT9w/5jTxUiEQkYxTVu6c//D7bNhezOOXDKZTm+ZxR4qVCoGIZJwxT+SzbO02rhzeh+P7dYw7TuxUCEQkozw582OmLlnLoO57NfqhI5KlQiAiGeODTzdz40sfAHD3qMMwy5xnBWqiQiAiGeG1RWs488/vAPDc5UfTo0PLmBOlDhUCEWn0Fn22hcueyAfg6R8P4che7WNOlFoiLQRmdpqZLTWz5Wb2iyq29zCzN81srpktMLMzoswjIpnnzSVrOePut8kyeOrSIQztkxt3pJQTWSEws2zgXuB0YAAw2swGVNrtBuBZdz8MGAXcF1UeEclMN74c9Ak8fslghvVVEahKlC2CwcByd1/h7sXABGBkpX0caBu+bgd8FmEeEckwL8/7lIKNO/hBXjeO7avbRKsTZSHoCqxOWC4I1yW6GbjAzAqAScC4qg5kZmPMLN/M8tetWxdFVhFpZDYVFnPVhHl0aJXDjWdWvhghieLuLB4NPObu3YAzgCfN7BuZ3H28u+e5e17HjqrqIlKzwuJSTrxrGgA3jziINs2bxhsoxUVZCD4FuicsdwvXJboUeBbA3WcAzQFdxBORPXL+w++zsbCEccP7cObALnHHSXlRFoJZQF8z621mOQSdwRMr7fMJcBKAmR1IUAh07UdEdktJWTmjx89k7ieb6J3bip+feoAeGktCZIXA3UuBscAUYDHB3UELzewWMxsR7vZz4DIzmw88A1zk7h5VJhFp3H43aQkzVmygX+fW/GPcsLjjpI1IJ6Zx90kEncCJ625KeL0IGBplBhHJDK8vWsOj765kSO/2TBhzlFoCdRB3Z7GISL24Y/ISAO47/3AVgTpSIRCRtFZW7oy85x2Wrd3G6MHd6dC6WdyR0o4KgYiktYnzP2V+wWZ6dWjJr0ccHHectKRCICJpa/HnW7jmb/MBmDhuGDlN9JW2O3TWRCQtlZU75zwwA4A7zj6EtnpobLepEIhIWho/fQXbdpZy8dBenHtkj7jjpDUVAhFJO0UlZTwxYxUAVw7vG2+YRiDS5whERKLQ/8bJANx05gD2bpUTc5r0pxaBiKSVh6avAKBTm2ZcMqx3zGkaBxUCEUkbL8wu4NZJiwGYdNWxMadpPFQIRCQtvDingJ8/F9wqeu95h5OrB8fqjfoIRCQt/OG1DwH457hhHNy1XcxpGhe1CEQk5b27fD0FG3fQp1NrFYEI1FoIzKxDQwQREalKeblz/sPvA3D3qMNiTtM4JdMimGlmz5nZGaYh/USkgV3+1GwA+u/ThgH7to05TeOUTCHoB4wHfggsM7PbzKxftLFERILWwKuL1gDw4k+PiTlN41VrIfDAa+4+GrgM+BHwbzN7y8yOjjyhiGSsCx/9NwA3fPtAWubo3pao1Hpmwz6CCwhaBGuAcQRzDw8CngP0RIeI1Ls1W4p4Z/l6AC4Zqq+ZKCVTYmcATwLfcfeChPX5ZvZANLFEJNONGj8TgMcvGUxWlrono5RMITigugnl3f2Oes4jIsLt/1rCyvXb6dKuOcf36xh3nEYvmc7iV81sr10LZra3mU2JLpKIZLrH3lsJwEs/GxpzksyQTCHo6O6bdi24+0agU2SJRCSj/XPBZxSVlHNS/050bts87jgZIZlCUGZmFbM+mFlPoMpLRSIie2JBwSbG/nUuAJefsH/MaTJHMn0E/wO8Y2ZvAQYcC4yJNJWIZBx3Z8Q97wLw81P6cWSv9jEnyhy1FgJ3n2xmhwNHhauudvf10cYSkUzz0rxPARjWJ5dxJ2nWsYaU7BMazYAvw/0HmBnuPj26WCKSaW74+wcA/P6cgTEnyTzJPFB2B3AusBAoD1c7oEIgIvXivmnL2V5cRte9WtClXYu442ScZFoE3yF4lmBnxFlEJAPtKC7jzslLAXh5rG4XjUMydw2tAJpGHUREMtP9b30EwKgju2vWsZgk0yIoBOaZ2VSgolXg7ldGlkpEMkJ5ufPI28Fk9Ld995CY02SuZArBxPC/OjOz04A/AdnAw+5+exX7/AC4maDfYb67n7c7nyUi6edfH3zB9uIyvj2wi8YTilEyt48+bmYtgB7uvjTZA5tZNnAvcApQAMwys4nuvihhn77AfwND3X2jmemJZZEM8cXmIsY+MweAa0/RFCdxSmaqyrOAecDkcHmQmSXTQhgMLHf3Fe5eDEwARlb
  427. "text/plain": [
  428. "<Figure size 432x288 with 1 Axes>"
  429. ]
  430. },
  431. "metadata": {
  432. "needs_background": "light"
  433. },
  434. "output_type": "display_data"
  435. }
  436. ],
  437. "source": [
  438. "### YOUR CODE ###\n",
  439. "plt.plot(fpr, tpr)\n",
  440. "plt.xscale(\"log\")\n",
  441. "plt.xlabel('Background efficiency')\n",
  442. "plt.ylabel('Signal efficiency');"
  443. ]
  444. }
  445. ],
  446. "metadata": {
  447. "kernelspec": {
  448. "display_name": "Python 3 (ipykernel)",
  449. "language": "python",
  450. "name": "python3"
  451. },
  452. "language_info": {
  453. "codemirror_mode": {
  454. "name": "ipython",
  455. "version": 3
  456. },
  457. "file_extension": ".py",
  458. "mimetype": "text/x-python",
  459. "name": "python",
  460. "nbconvert_exporter": "python",
  461. "pygments_lexer": "ipython3",
  462. "version": "3.8.16"
  463. }
  464. },
  465. "nbformat": 4,
  466. "nbformat_minor": 4
  467. }