{ "cells": [ { "cell_type": "markdown", "metadata": {}, "source": [ "# Exercise: Classification of air showers measured with the MAGIC telescope\n", "\n", "The [MAGIC telescope](https://en.wikipedia.org/wiki/MAGIC_(telescope)) is a Cherenkov telescope situated on La Palma, one of the Canary Islands. The [MAGIC machine learning dataset](https://archive.ics.uci.edu/ml/datasets/magic+gamma+telescope) can be obtained from [UC Irvine Machine Learning Repository](https://archive.ics.uci.edu/ml/index.php).\n", "\n", "The task is to separate signal events (gamma showers) and background events (hadron showers) based on the features of a measured Cherenkov shower.\n", "\n", "The features of a shower are:\n", "\n", " 1. fLength: continuous # major axis of ellipse [mm]\n", " 2. fWidth: continuous # minor axis of ellipse [mm] \n", " 3. fSize: continuous # 10-log of sum of content of all pixels [in #phot]\n", " 4. fConc: continuous # ratio of sum of two highest pixels over fSize [ratio]\n", " 5. fConc1: continuous # ratio of highest pixel over fSize [ratio]\n", " 6. fAsym: continuous # distance from highest pixel to center, projected onto major axis [mm]\n", " 7. fM3Long: continuous # 3rd root of third moment along major axis [mm] \n", " 8. fM3Trans: continuous # 3rd root of third moment along minor axis [mm]\n", " 9. fAlpha: continuous # angle of major axis with vector to origin [deg]\n", " 10. fDist: continuous # distance from origin to center of ellipse [mm]\n", " 11. class: g,h # gamma (signal), hadron (background)\n", "\n", "g = gamma (signal): 12332\n", "h = hadron (background): 6688\n", "\n", "For technical reasons, the number of h events is underestimated.\n", "In the real data, the h class represents the majority of the events.\n", "\n", "You can find further information about the MAGIC telescope and the data discrimination studies in the following [paper](https://reader.elsevier.com/reader/sd/pii/S0168900203025051?token=8A02764E2448BDC5E4DD0ED53A301295162A6E9C8F223378E8CF80B187DBFD98BD3B642AB83886944002206EB1688FF4) (R. K. Bock et al., \"Methods for multidimensional event classification: a case studyusing images from a Cherenkov gamma-ray telescope\" NIM A 516 (2004) 511-528) (You need to be within the university network to get free access.) " ] }, { "cell_type": "code", "execution_count": 2, "metadata": {}, "outputs": [], "source": [ "import pandas as pd\n", "import numpy as np\n", "from sklearn.model_selection import train_test_split" ] }, { "cell_type": "code", "execution_count": 3, "metadata": {}, "outputs": [], "source": [ "filename = \"https://www.physi.uni-heidelberg.de/~reygers/lectures/2021/ml/data/magic04_data.txt\"\n", "df = pd.read_csv(filename, engine='python')" ] }, { "cell_type": "code", "execution_count": 4, "metadata": {}, "outputs": [], "source": [ "# use categories 1 and 0 insted of \"g\" and \"h\"\n", "df['class'] = df['class'].map({'g': 1, 'h': 0})" ] }, { "cell_type": "code", "execution_count": 5, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", " | fLength | \n", "fWidth | \n", "fSize | \n", "fConc | \n", "fConc1 | \n", "fAsym | \n", "fM3Long | \n", "fM3Trans | \n", "fAlpha | \n", "fDist | \n", "class | \n", "
---|---|---|---|---|---|---|---|---|---|---|---|
0 | \n", "28.7967 | \n", "16.0021 | \n", "2.6449 | \n", "0.3918 | \n", "0.1982 | \n", "27.7004 | \n", "22.0110 | \n", "-8.2027 | \n", "40.0920 | \n", "81.8828 | \n", "1 | \n", "
1 | \n", "31.6036 | \n", "11.7235 | \n", "2.5185 | \n", "0.5303 | \n", "0.3773 | \n", "26.2722 | \n", "23.8238 | \n", "-9.9574 | \n", "6.3609 | \n", "205.2610 | \n", "1 | \n", "
2 | \n", "162.0520 | \n", "136.0310 | \n", "4.0612 | \n", "0.0374 | \n", "0.0187 | \n", "116.7410 | \n", "-64.8580 | \n", "-45.2160 | \n", "76.9600 | \n", "256.7880 | \n", "1 | \n", "
3 | \n", "23.8172 | \n", "9.5728 | \n", "2.3385 | \n", "0.6147 | \n", "0.3922 | \n", "27.2107 | \n", "-6.4633 | \n", "-7.1513 | \n", "10.4490 | \n", "116.7370 | \n", "1 | \n", "
4 | \n", "75.1362 | \n", "30.9205 | \n", "3.1611 | \n", "0.3168 | \n", "0.1832 | \n", "-5.5277 | \n", "28.5525 | \n", "21.8393 | \n", "4.6480 | \n", "356.4620 | \n", "1 | \n", "