ML-Kurs-SS2023/notebooks/03_ml_basics_ex_1_magic.ipynb

{
 "cells": [
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# Exercise: Classification of air showers measured with the MAGIC telescope\n",
    "\n",
    "The [MAGIC telescope](https://en.wikipedia.org/wiki/MAGIC_(telescope)) is a Cherenkov telescope situated on La Palma, one of the Canary Islands. The [MAGIC machine learning dataset](https://archive.ics.uci.edu/ml/datasets/magic+gamma+telescope) can be obtained from [UC Irvine Machine Learning Repository](https://archive.ics.uci.edu/ml/index.php).\n",
    "\n",
    "The task is to separate signal events (gamma showers) and background events (hadron showers) based on the features of a measured Cherenkov shower.\n",
    "\n",
    "The features of a shower are:\n",
    "\n",
    "    1.  fLength:  continuous  # major axis of ellipse [mm]\n",
    "    2.  fWidth:   continuous  # minor axis of ellipse [mm] \n",
    "    3.  fSize:    continuous  # 10-log of sum of content of all pixels [in #phot]\n",
    "    4.  fConc:    continuous  # ratio of sum of two highest pixels over fSize  [ratio]\n",
    "    5.  fConc1:   continuous  # ratio of highest pixel over fSize  [ratio]\n",
    "    6.  fAsym:    continuous  # distance from highest pixel to center, projected onto major axis [mm]\n",
    "    7.  fM3Long:  continuous  # 3rd root of third moment along major axis  [mm] \n",
    "    8.  fM3Trans: continuous  # 3rd root of third moment along minor axis  [mm]\n",
    "    9.  fAlpha:   continuous  # angle of major axis with vector to origin [deg]\n",
    "    10. fDist:    continuous  # distance from origin to center of ellipse [mm]\n",
    "    11. class:    g,h         # gamma (signal), hadron (background)\n",
    "\n",
    "g = gamma (signal):     12332\n",
    "h = hadron (background): 6688\n",
    "\n",
    "For technical reasons, the number of h events is underestimated.\n",
    "In the real data, the h class represents the majority of the events.\n",
    "\n",
    "You can find further information about the MAGIC telescope and the data discrimination studies in the following [paper](https://reader.elsevier.com/reader/sd/pii/S0168900203025051?token=8A02764E2448BDC5E4DD0ED53A301295162A6E9C8F223378E8CF80B187DBFD98BD3B642AB83886944002206EB1688FF4)  (R. K. Bock et al., \"Methods for multidimensional event classification: a case studyusing images from a Cherenkov gamma-ray telescope\" NIM A  516 (2004) 511-528) (You need to be within the university network to get free access.) "
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 2,
   "metadata": {},
   "outputs": [],
   "source": [
    "import pandas as pd\n",
    "import numpy as np\n",
    "from sklearn.model_selection import train_test_split"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 3,
   "metadata": {},
   "outputs": [],
   "source": [
    "filename = \"https://www.physi.uni-heidelberg.de/~reygers/lectures/2021/ml/data/magic04_data.txt\"\n",
    "df = pd.read_csv(filename, engine='python')"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# use categories 1 and 0 insted of \"g\" and \"h\"\n",
    "df['class'] = df['class'].map({'g': 1, 'h': 0})"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "df.head()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "#### a) Create for each variable a figure with a plot for gammas and hadrons overlayed."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "import matplotlib.pyplot as plt"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "df0 = df[df['class'] == 0] # hadron data set\n",
    "df1 = df[df['class'] == 1] # gamma data set\n",
    "\n",
    "print(len(df0),len(df1))\n",
    "\n",
    "### YOUR CODE ###\n",
    "\n"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "#### b) Create training and test data set. The tast data should amount to 50\\% of the total data set."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "y = df['class'].values\n",
    "X = df[[col for col in df.columns if col!=\"class\"]]\n",
    "\n",
    "### YOUR CODE ### \n",
    "\n"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "#### c) Define the logistic regressor and fit the training data"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "from sklearn import linear_model\n",
    "\n",
    "# define logistic regressor\n",
    "\n",
    "### YOUR CODE ###\n",
    "\n",
    "\n",
    "\n",
    "# fit training data\n",
    "\n",
    "### YOUR CODE ###\n",
    "\n"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "#### d) Determine the Model Accuracy, the AUC score and the Run time"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "from sklearn.metrics import roc_auc_score\n",
    "\n",
    "### YOUR CODE ###\n",
    "\n"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "#### e) Plot the ROC curve (Backgropund Rejection vs signal efficiency)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "import matplotlib.pyplot as plt\n",
    "from sklearn.metrics import roc_curve\n",
    "%matplotlib inline\n",
    "\n",
    "y_pred_prob = logreg.predict_proba(X_test) # predicted probabilities\n",
    "\n",
    "### YOUR CODE ###\n",
    "\n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "### YOUR CODE ###\n",
    "\n"
   ]
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "Python 3",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.8.5"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 4
}