ML-Kurs-SS2023/notebooks/03_ml_basics_ex_1_sol_magic.ipynb

468 lines
70 KiB
Plaintext
Raw Normal View History

2023-04-05 20:07:23 +02:00
{
"cells": [
{
"cell_type": "markdown",
"metadata": {},
"source": [
"# Exercise: Classification of air showers measured with the MAGIC telescope\n",
"\n",
"The [MAGIC telescope](https://en.wikipedia.org/wiki/MAGIC_(telescope)) is a Cherenkov telescope situated on La Palma, one of the Canary Islands. The [MAGIC machine learning dataset](https://archive.ics.uci.edu/ml/datasets/magic+gamma+telescope) can be obtained from [UC Irvine Machine Learning Repository](https://archive.ics.uci.edu/ml/index.php).\n",
"\n",
"The task is to separate signal events (gamma showers) and background events (hadron showers) based on the features of a measured Cherenkov shower.\n",
"\n",
"The features of a shower are:\n",
"\n",
" 1. fLength: continuous # major axis of ellipse [mm]\n",
" 2. fWidth: continuous # minor axis of ellipse [mm] \n",
" 3. fSize: continuous # 10-log of sum of content of all pixels [in #phot]\n",
" 4. fConc: continuous # ratio of sum of two highest pixels over fSize [ratio]\n",
" 5. fConc1: continuous # ratio of highest pixel over fSize [ratio]\n",
" 6. fAsym: continuous # distance from highest pixel to center, projected onto major axis [mm]\n",
" 7. fM3Long: continuous # 3rd root of third moment along major axis [mm] \n",
" 8. fM3Trans: continuous # 3rd root of third moment along minor axis [mm]\n",
" 9. fAlpha: continuous # angle of major axis with vector to origin [deg]\n",
" 10. fDist: continuous # distance from origin to center of ellipse [mm]\n",
" 11. class: g,h # gamma (signal), hadron (background)\n",
"\n",
"g = gamma (signal): 12332\n",
"h = hadron (background): 6688\n",
"\n",
"For technical reasons, the number of h events is underestimated.\n",
"In the real data, the h class represents the majority of the events.\n",
"\n",
"You can find further information about the MAGIC telescope and the data discrimination studies in the following [paper](https://reader.elsevier.com/reader/sd/pii/S0168900203025051?token=8A02764E2448BDC5E4DD0ED53A301295162A6E9C8F223378E8CF80B187DBFD98BD3B642AB83886944002206EB1688FF4) (R. K. Bock et al., \"Methods for multidimensional event classification: a case studyusing images from a Cherenkov gamma-ray telescope\" NIM A 516 (2004) 511-528) (You need to be within the university network to get free access.) "
]
},
{
"cell_type": "code",
"execution_count": 2,
"metadata": {},
"outputs": [],
"source": [
"import pandas as pd\n",
"import numpy as np\n",
"from sklearn.model_selection import train_test_split"
]
},
{
"cell_type": "code",
"execution_count": 3,
"metadata": {},
"outputs": [],
"source": [
"filename = \"https://www.physi.uni-heidelberg.de/~reygers/lectures/2021/ml/data/magic04_data.txt\"\n",
"df = pd.read_csv(filename, engine='python')"
]
},
{
"cell_type": "code",
"execution_count": 4,
"metadata": {},
"outputs": [],
"source": [
"# use categories 1 and 0 insted of \"g\" and \"h\"\n",
"df['class'] = df['class'].map({'g': 1, 'h': 0})"
]
},
{
"cell_type": "code",
"execution_count": 5,
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"<div>\n",
"<style scoped>\n",
" .dataframe tbody tr th:only-of-type {\n",
" vertical-align: middle;\n",
" }\n",
"\n",
" .dataframe tbody tr th {\n",
" vertical-align: top;\n",
" }\n",
"\n",
" .dataframe thead th {\n",
" text-align: right;\n",
" }\n",
"</style>\n",
"<table border=\"1\" class=\"dataframe\">\n",
" <thead>\n",
" <tr style=\"text-align: right;\">\n",
" <th></th>\n",
" <th>fLength</th>\n",
" <th>fWidth</th>\n",
" <th>fSize</th>\n",
" <th>fConc</th>\n",
" <th>fConc1</th>\n",
" <th>fAsym</th>\n",
" <th>fM3Long</th>\n",
" <th>fM3Trans</th>\n",
" <th>fAlpha</th>\n",
" <th>fDist</th>\n",
" <th>class</th>\n",
" </tr>\n",
" </thead>\n",
" <tbody>\n",
" <tr>\n",
" <th>0</th>\n",
" <td>28.7967</td>\n",
" <td>16.0021</td>\n",
" <td>2.6449</td>\n",
" <td>0.3918</td>\n",
" <td>0.1982</td>\n",
" <td>27.7004</td>\n",
" <td>22.0110</td>\n",
" <td>-8.2027</td>\n",
" <td>40.0920</td>\n",
" <td>81.8828</td>\n",
" <td>1</td>\n",
" </tr>\n",
" <tr>\n",
" <th>1</th>\n",
" <td>31.6036</td>\n",
" <td>11.7235</td>\n",
" <td>2.5185</td>\n",
" <td>0.5303</td>\n",
" <td>0.3773</td>\n",
" <td>26.2722</td>\n",
" <td>23.8238</td>\n",
" <td>-9.9574</td>\n",
" <td>6.3609</td>\n",
" <td>205.2610</td>\n",
" <td>1</td>\n",
" </tr>\n",
" <tr>\n",
" <th>2</th>\n",
" <td>162.0520</td>\n",
" <td>136.0310</td>\n",
" <td>4.0612</td>\n",
" <td>0.0374</td>\n",
" <td>0.0187</td>\n",
" <td>116.7410</td>\n",
" <td>-64.8580</td>\n",
" <td>-45.2160</td>\n",
" <td>76.9600</td>\n",
" <td>256.7880</td>\n",
" <td>1</td>\n",
" </tr>\n",
" <tr>\n",
" <th>3</th>\n",
" <td>23.8172</td>\n",
" <td>9.5728</td>\n",
" <td>2.3385</td>\n",
" <td>0.6147</td>\n",
" <td>0.3922</td>\n",
" <td>27.2107</td>\n",
" <td>-6.4633</td>\n",
" <td>-7.1513</td>\n",
" <td>10.4490</td>\n",
" <td>116.7370</td>\n",
" <td>1</td>\n",
" </tr>\n",
" <tr>\n",
" <th>4</th>\n",
" <td>75.1362</td>\n",
" <td>30.9205</td>\n",
" <td>3.1611</td>\n",
" <td>0.3168</td>\n",
" <td>0.1832</td>\n",
" <td>-5.5277</td>\n",
" <td>28.5525</td>\n",
" <td>21.8393</td>\n",
" <td>4.6480</td>\n",
" <td>356.4620</td>\n",
" <td>1</td>\n",
" </tr>\n",
" </tbody>\n",
"</table>\n",
"</div>"
],
"text/plain": [
" fLength fWidth fSize fConc fConc1 fAsym fM3Long fM3Trans \\\n",
"0 28.7967 16.0021 2.6449 0.3918 0.1982 27.7004 22.0110 -8.2027 \n",
"1 31.6036 11.7235 2.5185 0.5303 0.3773 26.2722 23.8238 -9.9574 \n",
"2 162.0520 136.0310 4.0612 0.0374 0.0187 116.7410 -64.8580 -45.2160 \n",
"3 23.8172 9.5728 2.3385 0.6147 0.3922 27.2107 -6.4633 -7.1513 \n",
"4 75.1362 30.9205 3.1611 0.3168 0.1832 -5.5277 28.5525 21.8393 \n",
"\n",
" fAlpha fDist class \n",
"0 40.0920 81.8828 1 \n",
"1 6.3609 205.2610 1 \n",
"2 76.9600 256.7880 1 \n",
"3 10.4490 116.7370 1 \n",
"4 4.6480 356.4620 1 "
]
},
"execution_count": 5,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"df.head()"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"#### a) Create for each variable a figure with a plot for gammas and hadrons overlayed."
]
},
{
"cell_type": "code",
"execution_count": 6,
"metadata": {},
"outputs": [],
"source": [
"import matplotlib.pyplot as plt"
]
},
{
"cell_type": "code",
"execution_count": 46,
"metadata": {},
"outputs": [],
"source": [
"df0 = df[df['class'] == 0] # hadron data set\n",
"df1 = df[df['class'] == 1] # gamma data set"
]
},
{
"cell_type": "code",
"execution_count": 45,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"<matplotlib.legend.Legend at 0x122031130>"
]
},
"execution_count": 45,
"metadata": {},
"output_type": "execute_result"
},
{
"data": {
"image/png": "iVBORw0KGgoAAAANSUhEUgAAA3YAAAE9CAYAAABHrfALAAAAOXRFWHRTb2Z0d2FyZQBNYXRwbG90bGliIHZlcnNpb24zLjMuMCwgaHR0cHM6Ly9tYXRwbG90bGliLm9yZy86wFpkAAAACXBIWXMAAAsTAAALEwEAmpwYAABY40lEQVR4nO3dfbwcZX3//9ebAIkCBgmaYhJMEKgFbwJECNVq0IJoLaEVJVgrttS0VWrVL7XwtUVK9QtYa4qFbzUFKuANWNR+I4I0Ss7Pqgkm3N9EYgwIQWwkoZETTCTw+f0x1x7mbPacs+fszczsvp+Px3mc2Wuunf3M7rWzc811zXUpIjAzMzMzM7Pq2q3oAMzMzMzMzKw1rtiZmZmZmZlVnCt2ZmZmZmZmFeeKnZmZmZmZWcW5YmdmZmZmZlZxrtiZmZmZmZlV3O5FBzAe+++/f8yePXvo8bZt29hrr72KC6iBMsYE5YyrnTHdeuutj0XEC8bzHEknAhcDk4DLIuLCuvWTgauAo4DNwKkR8WBa9wrgs8DzgGeAV0XE9pFeq77s1pTxcxkPx9+6iZTdbhup/NaU4X3Mczxja1dMZS+/ZT5vcCyNdSuWspddqM65g+MZW7tjGrH8RkRl/o466qjIW7FiRZRNGWOKKGdc7YwJWBPjKEtklbkfAwcBewJ3AofV5Xkv8Jm0vAi4Ni3vDtwFvDI9ngZMGu316stuJ96DIjj+1o237BbxN1L5rSnD+5jneMbWrpjKXn7LfN7gWBrrVixlL7tRoXMHxzO2dsc0Uvl1V0zrV0cD6yNiQ0T8CrgGWFiXZyFwZVq+DniDJAEnAHdFxJ0AEbE5Ip7uUtxmZmZmZrtwxc761Qzg4dzjjSmtYZ6I2AlsJWudOxQISTdJuk3Sh7sQr5mZmZnZiCp1j51ZSewOvAZ4FfAk8G1Jt0bEt/OZJC0GFgNMnz6dgYGBXTY0ODjYML0qHL+ZmZlZObRUsZvo4BOSjgaW1rIB50XE11qJxVrz1FNPsXHjRrZvH3H8j7abOnUqa9euHddzpkyZwsyZM9ljjz1afflHgFm5xzNTWqM8GyXtDkwlK8cbge9ExGMAkm4AjgSGVewiYimpnM+bNy8WLFiwSxADAwM0Sq8Kx29mZmZWDhOu2EmaBFwKHE92orta0rKIuC+X7Qzg8Yg4WNIi4CLgVOAeYF5E7JR0AHCnpK+n7m7tseKC7P9x57Rtk71s48aN7LPPPsyePZvsNrLOe+KJJ9hnn32azh8RbN68mY0bNzJnzpxWX341cIikOWQVuEXAO+ryLANOB1YCpwA3R0RIugn4sKTnAr8CXgcsaTWgIS67ZgAsWb5uaPmDxx9aYCRmXeBjvxWlVvbA5a/iWrnHbsKDT0TEk7lK3BQgWojD2mD79u1Mmzata5W6iZDEtGnT2tKqmMrfmcBNwFrgyxFxr6TzJZ2Usl0OTJO0HvgQcHZ67uPAp8gqh3cAt0XEN1oOyszMzMxsglrpitlo8IljRsqTWudqg088JukY4ArgxcAftrW1ziakzJW6mnbGGBE3ADfUpZ2bW94OvG2E534e+HzbgjEzMzMza0Fhg6dExC3A4ZJ+A7hS0o3RYILn0QagGHXgg8HUVa/LAyOUdTCGseKaOnUqTzzxRPcCAp5++ulhr/mTn/yEt7/97dxyyy2jPm/79u2lfI/NzMzGJd8FzsysRa1U7FoZfGJIRKyVNAi8DFhT/yKjDUAx6sAHtYPlgkXN7k9blHUwhrHiWrt27bD73fL3trRDo/tj6u+x23vvvdltt93GvO9uypQpHHHEEW2Nz6xKJF0BvAXYFBEva7BeZANbvZls5NZ3R8Rtad3pwN+krB+LiCvrn1+0dh9/zMysBb7/szJaucduaPAJSXuSDT6xrC5PbfAJGD74xJxU0UPSi4GXAg+2EIv1iKeffpr3vOc9HH744Zxwwgn88pe/LDokszL6HHDiKOvfBByS/hYD/wIgaT/go2Td5o8GPirp+R2NtE2WLF/nCp+ZmdkoJlyxa2XwCbI5wO6UdAfwNeC9taHjrb/96Ec/4n3vex/33nsv++67L1/5yleKDsmsdCLiO8CWUbIsBK6KzCpg3zQC8RuB5RGxJQ0CtJzRK4hmZtZGkiZJul3S9UXHYr2npXvsJjr4RERcDVzdymtbb5ozZw5z584F4KijjuLBBx8sNB6zimo0uNWMUdLNzKw7/pKsQeR5RQcyKt//WUmFDZ5i1sjkyZOHlidNmtTfXTHdp90KNNrAVfXaMWjUpid2DC2PVtMcGPjpmNsq2yBWZYsHyhlTz2p0guzjel+SNBP4HeDjZD3ZzNrKFTszs94z0uBWjwAL6tIHGm1gtIGr6rVj0Khm7597+4KxJyov2yBWZYsHyhlTz3GLh+3qn4APA6OPElckl9tKc8XOzKz3LAPOlHQN2UApWyPiUUk3Af8nN2DKCYCbDszKJH9i7Za9niGpNpLxrZIWjJJvzN4SbW9xr00RNpYRXrNsPQDKFg90LyZX7KyhRtMTdNrs2bO55557hh6fddZZXY/BrAokfYms5W1/SRvJRrrcAyAiPkN27/ObgfVk0x38UVq3RdLfk41qDHB+RIw2CIuZmbXHq4GTJL0ZmAI8T9LnI+Kd+UzN9JZoe4t7s610I0whVrYeAGWLB7oXkyt2ZmYVExGnjbE+gPeNsO4K4IpOxDUhQycUby00DLOuWnFB1kqyd9GBWLdExDmkHhKpxe6s+kqdWatamcfOzMysY+Y/tJT5Dy0tOgwzM7NK6P0WuxUXuI+6mZmZVY9HR+5JETHACANXlZrv/yy93q/YmZmZmZnZcB4Bs+e4K6aZmRXO3S7NzMxa4xY7MzOrjPx8d0WM3mtmZlZWrtiZmVlpuNXOzMxsYlyxs8ba3e/aN9k2beWGzUPLxx40rcBIzMzKQ9Is4CpgOhDA0oi4uNiozMzKwxU7K5W///u/5/Of/zwveMELmDVrFkcddZQnKjczM4CdwP+KiNsk7QPcKml5RNxXdGAd59EIrWw8YmspuWJnpbF69Wq+8pWvcOedd/LUU09x5JFHctRRRxUdlpmZlUBEPAo8mpafkLQWmAH0fsXOzKwJrthZaXzve99j4cKFTJkyhSlTpvC7v/u7RYdkZmYlJGk2cARwS8GhFMeteGZWxxU7MzMzqwxJewNfAT4QEb9osH4xsBhg+vTpDAwMDK0bHBwc9rjtnvhZ7sGcUbMOPjOZgcFd82zbsXNoea/JDU7TavHnn9viPnX8fRmHMsViTVhxwfCyaIVyxc5K49WvfjV/+qd/yjnnnMPOnTu5/vrrWbx4cdFhmVknjGOApvqRMlcd6ONCv5K0B1ml7gsR8dVGeSJiKbAUYN68ebFgwYKhdQMDA+Qft904yvXA4BwW7P3ALukrN40xgNaCRbu+Vi1tgjr+voxDmWIxqxpX7Kw0XvWqV3HSSSfxile8gunTp/Pyl7+cqVOnFh1W8VZc4G42Ztb3JAm4HFgbEZ8qOh6zymr3yOdWGq7YWWMFVSTOOusszjvvPJ588kle+9rXevAUMzOreTXwh8Ddku5Iaf87Im4oLqQCNDgpX7J8HQAfPP7QbkdjZiXiip2VyuLFi7nvvvvYvn07p59+OkceeWTRIZmZWQlExHcBFR2HWT/Iz6lb47l1y88VOyuVL37xi0WHYGadVNfa0OjkwczMuqt2LJ5w5c2jtJbCbkUHYGZmZmZmZq1xi50NiQiye9PLKyKKDsGscJJOBC4GJgGXRcSFdeuXAMelh88FXhgR+6Z1TwN3p3UPRcRJXQm6A3xfkZVNvgW6UctHfv22qbNYuWnzUL5WWq+HRo5dkV7TLSZmfckVOwNgypQpbN68mWnTppW2chcRbN68mSlTphQdillhJE0CLgWOBzYCqyUti4j7anki4oO5/H9BNpFzzS8jYm6XwjWzMTRToRurwmjWjKGLYensv51d4Vvuymlt4YqdATBz5kw2btzIz3/+86695vbt28ddSZsyZQozZ85sy+s30eoxGbgKOArYDJwaEQ/m1h8I3AecFxGfbEtQZmM7GlgfERsAJF0
"text/plain": [
"<Figure size 1080x360 with 10 Axes>"
]
},
"metadata": {
"needs_background": "light"
},
"output_type": "display_data"
}
],
"source": [
"fig, axs = plt.subplots(nrows=2, ncols=5, figsize=(15,5))\n",
"plt.subplots_adjust(hspace = 0.3, wspace=0.3)\n",
"\n",
"for i in range(10):\n",
" kx = i // 5\n",
" ky = i % 5\n",
" axs[kx, ky].set_xlabel(df0.columns[i])\n",
" df0.iloc[:,i].hist(ax = axs[kx, ky], bins = 50, alpha=0.5, density=True, label='h')\n",
" df1.iloc[:,i].hist(ax = axs[kx, ky], bins = 50, alpha=0.5, density=True, label='g')\n",
"\n",
"axs[0, 0].legend()"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"#### b) Create training and test data set. The tast data should amount to 50\\% of the total data set."
]
},
{
"cell_type": "code",
"execution_count": 21,
"metadata": {},
"outputs": [],
"source": [
"y = df['class'].values\n",
"X = df[[col for col in df.columns if col!=\"class\"]]\n",
"\n",
"### YOUR CODE ### \n",
"\n",
"X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.5, shuffle=True)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"#### c) Define the logistic regressor and fit the training data"
]
},
{
"cell_type": "code",
"execution_count": 15,
"metadata": {},
"outputs": [],
"source": [
"from sklearn import linear_model\n",
"\n",
"# define logistic regressor\n",
"\n",
"### YOUR CODE ###\n",
"\n",
"logreg=linear_model.LogisticRegression(fit_intercept=True,\n",
" penalty='none',\n",
" max_iter=1000,\n",
" tol=1E-5)\n",
"\n",
"# fit training data\n",
"\n",
"### YOUR CODE ###\n",
"\n",
"import time\n",
"start_time = time.time()\n",
"logreg.fit(X_train, y_train)\n",
"run_time = time.time() - start_time"
]
},
{
"cell_type": "code",
"execution_count": 16,
"metadata": {},
"outputs": [],
"source": [
"y_pred = logreg.predict(X_test)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"#### d) Determine the Model Accuracy, the AUC score and the Run time"
]
},
{
"cell_type": "code",
"execution_count": 17,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"Model Accuracy: 78.99%\n",
"AUC score: 0.74\n",
"Run time: 0.22 sec\n",
"\n",
"\n"
]
}
],
"source": [
"from sklearn.metrics import roc_auc_score\n",
"\n",
"### YOUR CODE ###\n",
"\n",
"print(\"Model Accuracy: {:.2f}%\".format(100*logreg.score(X_test, y_test)))\n",
"print(\"AUC score: {:.2f}\".format(roc_auc_score(y_test,y_pred)))\n",
"print(\"Run time: {:.2f} sec\\n\\n\".format(run_time))"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"#### e) Plot the ROC curve (Backgropund Rejection vs signal efficiency)"
]
},
{
"cell_type": "code",
"execution_count": 18,
"metadata": {},
"outputs": [
{
"data": {
"image/png": "iVBORw0KGgoAAAANSUhEUgAAAYIAAAEGCAYAAABo25JHAAAAOXRFWHRTb2Z0d2FyZQBNYXRwbG90bGliIHZlcnNpb24zLjMuMCwgaHR0cHM6Ly9tYXRwbG90bGliLm9yZy86wFpkAAAACXBIWXMAAAsTAAALEwEAmpwYAAAowElEQVR4nO3deXxV9Z3/8dfnZoUkhCUBkTWsilZcouIOddcW2mpbrbS1taXTamfsNuP8dMbW7na62dpatdbWad26OFAVqAuiVpRYFgFlkX0POyF78vn9cQ40QJYbkpN7b+77+XicB/ec873nfI5IPvme72bujoiIpK9YogMQEZHEUiIQEUlzSgQiImlOiUBEJM0pEYiIpLnMRAfQXkVFRT58+PBEhyEiklLefPPNHe5e3Ny5lEsEw4cPp6ysLNFhiIikFDNb19I5vRoSEUlzSgQiImlOiUBEJM0pEYiIpDklAhGRNBdZIjCzh8xsu5ktaeG8mdk9ZrbKzBab2elRxSIiIi2LskbwMHBFK+evBEaH2zTglxHGIiIiLYhsHIG7zzWz4a0UmQL8zoN5sOeZWW8zG+juW6KIp2ztLuau3EGPrAyO751LbX0jg3r3ICcrxvB+efTLz4nitiIiSS+RA8oGARua7G8Mjx2VCMxsGkGtgaFDhx7Tzd5ct5t7nl/Z4vn8nEwO1NYzsjif0mF9iMWMDDMyYkZNfSNjBuQzrF9PDGNI354UF+TQIyuD7Ew1s4hIakuJkcXufj9wP0BpaekxraTzuYtGcsOEYew+UEttQyPb99XgOKvLD7BhVyXz1uxi1bb9VFTX8+Ly7TQ0Og2Nzu7KuhavGTPIzoxxyqDe3DBhKAMLe9CnZxZ5OZkU5GaSk6lEISLJL5GJYBMwpMn+4PBYZPJzMsnPCR55ZHE+AOeOLGrze+X7a9i2r5r6RmfDrko27K4kOyPGG2t28damvbyxdhdvrN3V4vd75WZSOrwv2RkxLhxTTG19A8P65dG/Vw79C3Ipys/GzDrnIUVE2imRiWA6cIuZPQacDeyNqn2go4oLciguCNoQTh3S+9Dxz1wwAggSxVub9lBb75Tvr6amvpFGd5Zt3kd5RQ219Y288M52AGYu3XrU9fOyM+iRnUFeTiaj+xdQ2COLipo6BvfpyZgB+eRkZjCgVy5987LJzAheWfXNzyY7I0Z2RoxYTElERI5dZInAzB4FJgJFZrYRuBPIAnD3+4BngKuAVUAl8KmoYolacUEO7z1hQJvl9lbWUV3fwLZ91WzcXUX5/hr2VdWxbMs+3tm6n+q6BlZu38+6nZXtun9OZowJI/qRn5tJ/4Icxg4oYERxPkX52RTkZpGfk0lWhpGZoddUInK0KHsNXd/GeQdujur+yaiwZxaFZDGgVy6nDO7dZvldB2rZUVHDpt1V1NQ30tDo1Dc2sm5nJT2zM9i6t5ote6vZsreKl1aUt3m9mMGHTh9MozvD++UxojiPzFiMhkbn+N659MjOYERRvto1RNJMSjQWp6u+edn0zctmzICCuMq7Oxt3V7FqewVVdQ1s3lOFOzS488c3N1JT38Af39wY9/0nji1mWN+e5OdmUpSfQ2ZGjKK8bIaFSSQ3K+NYH01EkogFv5injtLSUtd6BB3T0Ois31VJXUMjNXWNlFdUYxjb9lVTtm43L60oZ0dFDTmZMarrGlu8zpC+PQAYN7AXZ5X04/KTBjC4T8+uegwRaQcze9PdS5s9p0QgralvaKSqroF91fXsraxj854qFm7Yw6Y9VSzdvJcV2yqO+k7psD70yctmRFEeWRkxhvXrSVZGjEZ3Bhb2YECvHIb0DY6JSNdQIpDINDQ6W/ZW8ea63Tz39nYWbdjD+l1BG0ZlbUOr3y3Kz6FXbibXnDGYkcV5nHR8IUP6qkYhEgUlAkmIxkZn+/4a6hoaqaxtYE9lLRt3V7F6RwXrd1Uxe+lWauqbf/X071eMZURRHmeV9KNvXnYXRy7S/bSWCNRYLJGJxYzjCnMPO3b2EWWq6xpYtb2Cldv3M3vpNp5dEoyzuHvm8sPKnTm8DwW5WRT2yGLi2GIaGp2SojyKC3LIzoxRlJej8RQix0g1Akk6jY3Omp0HeHrxFp55aws5WRnU1DXwztb9bX63V24mOVkZnDakNx8uHcJJx/eib162ejhJ2tOrIekWGhudTXuqOFBbz/Kt+4NxFQ3Ou+UVxGLG+p2VzF1Zzv7q+ma/P6p/PnnZGdTUN3LdmUO44uSBR9VYRLorJQJJO6u2V7Bwwx72VdUxY/Fm+uVls2JbBet3HT1qe2jfntTWNzL2uALGDy5k6jnD6F+gBCHdixKBSBNVtQ089/Y2Zi/bRm5mjDfX72btjgM0HvFPobBHFmeV9KWuoZFLThzAOSP7MaIoTxMESkpSIhCJQ3VdA4/P38Cyzft4bfVOCnIzWbp531Hlxg8u5PMTR3HOiH4U9sxKQKQi7adEINIBtfWNzFm+nSfKNvLc29uOOn/XlJOYevYw9VqSpKZEINKJNu+p4o01u7jvpXcP9WTKjBl98rIZP7iQi08cwOTxx5OXo97ZkjyUCEQisreqjifmb+D3r69jbTPTh59V0pf3DCpk0tj+nDCwgCKtjS0JokQg0kW2769m1tJtLFi/m90Hanlx+eHTg2dnxKhtaORjZw/lg6cN4vShfcjQKyXpAkoEIgnS0Oj8Y/1uVpdXsHZnJS++s/2ogXEfKR3MCcf1YuLYYkaES6iKdDYlApEks35nJQ+8vJpZS7eyfX/NoePFBTlcc/pgLh3XnxOO66V2Buk0SgQiSWxfdR0Pzl3N0s37eD5c2/qgCSP6cteUkxlVnK9eSdIhSgQiKWL3gVoWbtzDjEWbmbtiBzsqag473ys3k7NK+jFxbDEXjC5iWL+8BEUqqUaJQCRFLdqwh78u3kxVXQMzl2xl54Famv6Tzc6IcdKgXlxy4gA+c0EJOZmaXE+ap0Qg0o3sOlDLUws28fj8DVTVNRw2f1JBTibTLhzB9WcPpV9etqbDkEOUCES6sb2VdUxfvJlnFm/htdU7Dx3Pzoxx4nEFDOnbkwvHFHPVewaSr8bntKVEIJIm9lbV8cc3N7J2xwFefXcHq8sPHFXmxnOH86VLxmiepDSjRCCSxlaXV/DIvHX8ft56ahsOXxp0/JDenD+qH1+6ZAyZGbEERShdQYlARACoqKnnz//YyJJNe1mwfg8rt1ccOve5C0fwhUmjKOyhmkJ3pEQgIs2qb2jkjqeW8Nj8DYcdP2VwIacP7cMdV5+omkI3oUQgIq2qqW/gqQWb+Pu7O6msbeBvy/453fbFJ/Tnw6VDmDi2WGs/pzAlAhFpl10Harnn+ZU89/Y2Nu6uOnT8PYMK+eBpg7j85OMY1LtHAiOU9lIiEJFjdqCmnhmLNvP9me+wu7LusHM/+eipTDn1eI1XSAFKBCLSKapqG3jmrS185clFhx0f3KcH9009g3EDe2lOpCSlRCAinW5HRQ0/fW4lj8xbd9jxEcV5zLr1QrLUyJxUWksEkf5NmdkVZrbczFaZ2W3NnB9qZi+a2QIzW2xmV0UZj4h0nqL8HL75gZNZ+72reeU/JvG5i0ZgBqvLDzD69mcpW7sr0SFKnCKrEZhZBrACuBTYCMwHrnf3ZU3K3A8scPdfmtk44Bl3H97adVUjEElu478xm71VQVvCwMJczhzel59ed6raERIsUTWCs4BV7r7a3WuBx4ApR5RxoFf4uRDYHGE8ItIFFt15GU987hzOG9WPLXurmb5oMxO++zzz1+6ioTG1XkWniyhrBNcCV7j7Z8L9jwNnu/stTcoMBGYDfYA84BJ3f7OZa00DpgEMHTr0jHXr1h1ZRESS0K4DtXzukTLmr9196NiI4jz+35Uncsm4AQmMLP0krI0gDtcDD7v7YOAq4BEzOyomd7/f3UvdvbS4uLjLgxSRY9M3L5sn/+VcnvvyRXy0dAgQtCF85ndlXPXTl9m2rzrBEQpEmwg2AUOa7A8OjzV1E/AEgLu/BuQCRRHGJCIJMKp/Pt+/9hTWfu9q/n7beynKz2HZln2c/Z3n+cC9r/Lauzvbvoh
"text/plain": [
"<Figure size 432x288 with 1 Axes>"
]
},
"metadata": {
"needs_background": "light"
},
"output_type": "display_data"
}
],
"source": [
"import matplotlib.pyplot as plt\n",
"from sklearn.metrics import roc_curve\n",
"%matplotlib inline\n",
"\n",
"y_pred_prob = logreg.predict_proba(X_test) # predicted probabilities\n",
"\n",
"### YOUR CODE ###\n",
"\n",
"fpr, tpr, _ = roc_curve(y_test, y_pred_prob[:,1])\n",
"plt.plot(tpr, 1-fpr)\n",
"plt.xlabel('Signal efficiency')\n",
"plt.ylabel('Background Rejection');"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"#### f) Plot the Signal efficiency vs. the Background efficiency and compare it to the corresponding plot in the paper"
]
},
{
"cell_type": "code",
"execution_count": 19,
"metadata": {},
"outputs": [
{
"data": {
"image/png": "iVBORw0KGgoAAAANSUhEUgAAAYIAAAEKCAYAAAAfGVI8AAAAOXRFWHRTb2Z0d2FyZQBNYXRwbG90bGliIHZlcnNpb24zLjMuMCwgaHR0cHM6Ly9tYXRwbG90bGliLm9yZy86wFpkAAAACXBIWXMAAAsTAAALEwEAmpwYAAAmgUlEQVR4nO3deXwV5dn/8c+VQNhBIYDIroCIiqgRVHDDpWoV2lorqLUulWoLbtXf06ePWmurVevTp7WuuNS1UrcqrRRUFHEBS1grmyCgxIVN9hCyXb8/ZojHmOUEMplzcr7v18tXzixnzjdTeq7cc8/ct7k7IiKSubLiDiAiIvFSIRARyXAqBCIiGU6FQEQkw6kQiIhkOBUCEZEM1yTuAHWVm5vrvXr1ijuGiEhamT179np371jVtrQrBL169SI/Pz/uGCIiacXMPq5umy4NiYhkOBUCEZEMp0IgIpLhVAhERDJcZIXAzB41s7Vm9kE1283M7jaz5Wa2wMwOjyqLiIhUL8oWwWPAaTVsPx3oG/43Brg/wiwiIlKNyG4fdffpZtarhl1GAk94MA72TDPby8y6uPvnUWUSEUkHxaXlrFy/na1FJRSXlVNcWk5JmdN/nzZ0b9+y3j8vzucIugKrE5YLwnXfKARmNoag1UCPHj0aJJyISEP44NPNPPPvT3h/5ZeUlTsr12+vdt/ffudgLjiqZ71nSIsHytx9PDAeIC8vTzPpiEjaW7V+OzdNXMj0D9cBsF/HVnRp15xDu7WjRU4Tuu3dgh7tW9KxTTOaZmeRk51F171bRJIlzkLwKdA9YblbuE5EpFF7Y8kaLnksGCHh24d04aazBtC5bfPY8sRZCCYCY81sAjAE2Kz+ARFp7F6e9ylXTZhHuxZNeerSIRzSrV3ckaIrBGb2DHACkGtmBcCvgKYA7v4AMAk4A1gOFAIXR5VFRCRu7s6dU5Zy/7SPAHjxp8ewf8fWMacKRHnX0Ohatjvws6g+X0QkVRRsLOTMP7/DpsIShvbpwP0XHEHb5k3jjlUhLTqLRUTS1bI1W7n4sVlsKizhh0f15NcjDiIry+KO9TUqBCIiEXk2fzX/7/kFAFx1Ul+uOaVfzImqpkIgIlLPCotL+dnTc3hzaXBr6IM/PIJvHbRPzKmqp0IgIlKP/r3ySy59bBZbd5YytE8HHrjgCNqkUH9AVVQIRETqydTFa/jp03PYWVrO/55zKGcf0S3uSElRIRARqQfvLl/PpY8HD4lNvvpY+u/TNuZEyVMhEBHZQ8vXbuX8h98nJzuLKdccR+/cVnFHqhNNTCMisgfeWLKG7933Hq1ysnlmzFFpVwRALQIRkd2yfWcplz2Rz3sfbSC3dTMevSiPgd32ijvWblEhEBGpA3fnHws+5xcvLKCwuIwzDtmHO84emPJ3BtVEhUBEJEkfrtnKqf83HQCz6OYHaGgqBCIiSZi3ehPnPzST7Czjh0f15NpT+6XUeEF7QoVARKQWX2wu4rywCLz8s6Ec3DX+oaPrkwqBiEgNlq/dynkPvU9hcRnPXHZUoysCoNtHRUSqtaO4jLF/ncvarTt59KI8jt6/Q9yRIqEWgYhIFdZuKWLwbVOBYOTQ4f07x5woOmoRiIhUUlRSxjkPzgBg3PA+KTt8dH1RIRARSVBaVs4PH3mfjzcUMvbEPvz81APijhQ5FQIRkQTjnpnLrFUbObLX3lz3rcZfBEB9BCIiAGwpKuGaCfOYumQtw/t34uEL8+KO1GBUCEQk423bWcrRt01le3EZpwzozH3nH55y8wpHSYVARDLaxu3FHPW7qewsLec7g/blj6MOiztSg1MhEJGMNvqhmRSXlXPjmQO4dFjvuOPEQp3FIpKx/j63gCVfbOVHR/fK2CIAKgQikqHWbinit/9cDMBPT9w/5jTxUiEQkYxTVu6c//D7bNhezOOXDKZTm+ZxR4qVCoGIZJwxT+SzbO02rhzeh+P7dYw7TuxUCEQkozw582OmLlnLoO57NfqhI5KlQiAiGeODTzdz40sfAHD3qMMwy5xnBWqiQiAiGeG1RWs488/vAPDc5UfTo0PLmBOlDhUCEWn0Fn22hcueyAfg6R8P4che7WNOlFoiLQRmdpqZLTWz5Wb2iyq29zCzN81srpktMLMzoswjIpnnzSVrOePut8kyeOrSIQztkxt3pJQTWSEws2zgXuB0YAAw2swGVNrtBuBZdz8MGAXcF1UeEclMN74c9Ak8fslghvVVEahKlC2CwcByd1/h7sXABGBkpX0caBu+bgd8FmEeEckwL8/7lIKNO/hBXjeO7avbRKsTZSHoCqxOWC4I1yW6GbjAzAqAScC4qg5kZmPMLN/M8tetWxdFVhFpZDYVFnPVhHl0aJXDjWdWvhghieLuLB4NPObu3YAzgCfN7BuZ3H28u+e5e17HjqrqIlKzwuJSTrxrGgA3jziINs2bxhsoxUVZCD4FuicsdwvXJboUeBbA3WcAzQFdxBORPXL+w++zsbCEccP7cObALnHHSXlRFoJZQF8z621mOQSdwRMr7fMJcBKAmR1IUAh07UdEdktJWTmjx89k7ieb6J3bip+feoAeGktCZIXA3UuBscAUYDHB3UELzewWMxsR7vZz4DIzmw88A1zk7h5VJhFp3H43aQkzVmygX+fW/GPcsLjjpI1IJ6Zx90kEncCJ625KeL0IGBplBhHJDK8vWsOj765kSO/2TBhzlFoCdRB3Z7GISL24Y/ISAO47/3AVgTpSIRCRtFZW7oy85x2Wrd3G6MHd6dC6WdyR0o4KgYiktYnzP2V+wWZ6dWjJr0ccHHectKRCICJpa/HnW7jmb/MBmDhuGDlN9JW2O3TWRCQtlZU75zwwA4A7zj6EtnpobLepEIhIWho/fQXbdpZy8dBenHtkj7jjpDUVAhFJO0UlZTwxYxUAVw7vG2+YRiDS5whERKLQ/8bJANx05gD2bpUTc5r0pxaBiKSVh6avAKBTm2ZcMqx3zGkaBxUCEUkbL8wu4NZJiwGYdNWxMadpPFQIRCQtvDingJ8/F9wqeu95h5OrB8fqjfoIRCQt/OG1DwH457hhHNy1XcxpGhe1CEQk5b27fD0FG3fQp1NrFYEI1FoIzKxDQwQREalKeblz/sPvA3D3qMNiTtM4JdMimGlmz5nZGaYh/USkgV3+1GwA+u/ThgH7to05TeOUTCHoB4wHfggsM7PbzKxftLFERILWwKuL1gDw4k+PiTlN41VrIfDAa+4+GrgM+BHwbzN7y8yOjjyhiGSsCx/9NwA3fPtAWubo3pao1Hpmwz6CCwhaBGuAcQRzDw8CngP0RIeI1Ls1W4p4Z/l6AC4Zqq+ZKCVTYmcATwLfcfeChPX5ZvZANLFEJNONGj8TgMcvGUxWlrono5RMITigugnl3f2Oes4jIsLt/1rCyvXb6dKuOcf36xh3nEYvmc7iV81sr10LZra3mU2JLpKIZLrH3lsJwEs/GxpzksyQTCHo6O6bdi24+0agU2SJRCSj/XPBZxSVlHNS/050bts87jgZIZlCUGZmFbM+mFlPoMpLRSIie2JBwSbG/nUuAJefsH/MaTJHMn0E/wO8Y2ZvAQYcC4yJNJWIZBx3Z8Q97wLw81P6cWSv9jEnyhy1FgJ3n2xmhwNHhauudvf10cYSkUzz0rxPARjWJ5dxJ2nWsYaU7BMazYAvw/0HmBnuPj26WCKSaW74+wcA/P6cgTEnyTzJPFB2B3AusBAoD1c7oEIgIvXivmnL2V5cRte9WtClXYu442ScZFoE3yF4lmBnxFlEJAPtKC7jzslLAXh5rG4XjUMydw2tAJpGHUREMtP9b30EwKgju2vWsZgk0yIoBOaZ2VSgolXg7ldGlkpEMkJ5ufPI28Fk9Ld995CY02SuZArBxPC/OjOz04A/AdnAw+5+exX7/AC4maDfYb67n7c7nyUi6edfH3zB9uIyvj2wi8YTilEyt48+bmYtgB7uvjTZA5tZNnAvcApQAMwys4nuvihhn77AfwND3X2jmemJZZEM8cXmIsY+MweAa0/RFCdxSmaqyrOAecDkcHmQmSXTQhgMLHf3Fe5eDEwARlb
"text/plain": [
"<Figure size 432x288 with 1 Axes>"
]
},
"metadata": {
"needs_background": "light"
},
"output_type": "display_data"
}
],
"source": [
"### YOUR CODE ###\n",
"plt.plot(fpr, tpr)\n",
"plt.xscale(\"log\")\n",
"plt.xlabel('Background efficiency')\n",
"plt.ylabel('Signal efficiency');"
]
}
],
"metadata": {
"kernelspec": {
"display_name": "Python 3 (ipykernel)",
2023-04-05 20:07:23 +02:00
"language": "python",
"name": "python3"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 3
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.8.16"
2023-04-05 20:07:23 +02:00
}
},
"nbformat": 4,
"nbformat_minor": 4
}