Compare commits
No commits in common. "c91b2cdbb7a74d76c25a7959c3acf73d65bb8a8d" and "5f54ce46f29d87cdd3b2f29b575ee4a3d64039d8" have entirely different histories.
c91b2cdbb7
...
5f54ce46f2
@ -1,228 +0,0 @@
|
||||
{
|
||||
"cells": [
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"# Exercise: Classification of air showers measured with the MAGIC telescope\n",
|
||||
"\n",
|
||||
"The [MAGIC telescope](https://en.wikipedia.org/wiki/MAGIC_(telescope)) is a Cherenkov telescope situated on La Palma, one of the Canary Islands. The [MAGIC machine learning dataset](https://archive.ics.uci.edu/ml/datasets/magic+gamma+telescope) can be obtained from [UC Irvine Machine Learning Repository](https://archive.ics.uci.edu/ml/index.php).\n",
|
||||
"\n",
|
||||
"The task is to separate signal events (gamma showers) and background events (hadron showers) based on the features of a measured Cherenkov shower.\n",
|
||||
"\n",
|
||||
"The features of a shower are:\n",
|
||||
"\n",
|
||||
" 1. fLength: continuous # major axis of ellipse [mm]\n",
|
||||
" 2. fWidth: continuous # minor axis of ellipse [mm] \n",
|
||||
" 3. fSize: continuous # 10-log of sum of content of all pixels [in #phot]\n",
|
||||
" 4. fConc: continuous # ratio of sum of two highest pixels over fSize [ratio]\n",
|
||||
" 5. fConc1: continuous # ratio of highest pixel over fSize [ratio]\n",
|
||||
" 6. fAsym: continuous # distance from highest pixel to center, projected onto major axis [mm]\n",
|
||||
" 7. fM3Long: continuous # 3rd root of third moment along major axis [mm] \n",
|
||||
" 8. fM3Trans: continuous # 3rd root of third moment along minor axis [mm]\n",
|
||||
" 9. fAlpha: continuous # angle of major axis with vector to origin [deg]\n",
|
||||
" 10. fDist: continuous # distance from origin to center of ellipse [mm]\n",
|
||||
" 11. class: g,h # gamma (signal), hadron (background)\n",
|
||||
"\n",
|
||||
"g = gamma (signal): 12332\n",
|
||||
"h = hadron (background): 6688\n",
|
||||
"\n",
|
||||
"For technical reasons, the number of h events is underestimated.\n",
|
||||
"In the real data, the h class represents the majority of the events.\n",
|
||||
"\n",
|
||||
"You can find further information about the MAGIC telescope and the data discrimination studies in the following [paper](https://reader.elsevier.com/reader/sd/pii/S0168900203025051?token=8A02764E2448BDC5E4DD0ED53A301295162A6E9C8F223378E8CF80B187DBFD98BD3B642AB83886944002206EB1688FF4) (R. K. Bock et al., \"Methods for multidimensional event classification: a case studyusing images from a Cherenkov gamma-ray telescope\" NIM A 516 (2004) 511-528) (You need to be within the university network to get free access.) "
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 2,
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"import pandas as pd\n",
|
||||
"import numpy as np\n",
|
||||
"from sklearn.model_selection import train_test_split"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 3,
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"filename = \"https://www.physi.uni-heidelberg.de/~reygers/lectures/2021/ml/data/magic04_data.txt\"\n",
|
||||
"df = pd.read_csv(filename, engine='python')"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"# use categories 1 and 0 insted of \"g\" and \"h\"\n",
|
||||
"df['class'] = df['class'].map({'g': 1, 'h': 0})"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"df.head()"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"#### a) Create for each variable a figure with a plot for gammas and hadrons overlayed."
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"import matplotlib.pyplot as plt"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"df0 = df[df['class'] == 0] # hadron data set\n",
|
||||
"df1 = df[df['class'] == 1] # gamma data set\n",
|
||||
"\n",
|
||||
"print(len(df0),len(df1))\n",
|
||||
"\n",
|
||||
"### YOUR CODE ###\n",
|
||||
"\n"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"#### b) Create training and test data set. The tast data should amount to 50\\% of the total data set."
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"y = df['class'].values\n",
|
||||
"X = df[[col for col in df.columns if col!=\"class\"]]\n",
|
||||
"\n",
|
||||
"### YOUR CODE ### \n",
|
||||
"\n"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"#### c) Define the logistic regressor and fit the training data"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"from sklearn import linear_model\n",
|
||||
"\n",
|
||||
"# define logistic regressor\n",
|
||||
"\n",
|
||||
"### YOUR CODE ###\n",
|
||||
"\n",
|
||||
"\n",
|
||||
"\n",
|
||||
"# fit training data\n",
|
||||
"\n",
|
||||
"### YOUR CODE ###\n",
|
||||
"\n"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"#### d) Determine the Model Accuracy, the AUC score and the Run time"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"from sklearn.metrics import roc_auc_score\n",
|
||||
"\n",
|
||||
"### YOUR CODE ###\n",
|
||||
"\n"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"#### e) Plot the ROC curve (Backgropund Rejection vs signal efficiency)"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"import matplotlib.pyplot as plt\n",
|
||||
"from sklearn.metrics import roc_curve\n",
|
||||
"%matplotlib inline\n",
|
||||
"\n",
|
||||
"y_pred_prob = logreg.predict_proba(X_test) # predicted probabilities\n",
|
||||
"\n",
|
||||
"### YOUR CODE ###\n",
|
||||
"\n"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"### YOUR CODE ###\n",
|
||||
"\n"
|
||||
]
|
||||
}
|
||||
],
|
||||
"metadata": {
|
||||
"kernelspec": {
|
||||
"display_name": "Python 3",
|
||||
"language": "python",
|
||||
"name": "python3"
|
||||
},
|
||||
"language_info": {
|
||||
"codemirror_mode": {
|
||||
"name": "ipython",
|
||||
"version": 3
|
||||
},
|
||||
"file_extension": ".py",
|
||||
"mimetype": "text/x-python",
|
||||
"name": "python",
|
||||
"nbconvert_exporter": "python",
|
||||
"pygments_lexer": "ipython3",
|
||||
"version": "3.8.5"
|
||||
}
|
||||
},
|
||||
"nbformat": 4,
|
||||
"nbformat_minor": 4
|
||||
}
|
File diff suppressed because one or more lines are too long
File diff suppressed because one or more lines are too long
File diff suppressed because one or more lines are too long
@ -601,7 +601,7 @@ i) Display a numpy array as figure of a blue cross. The size should be 200
|
||||
selects the inner part of the circle using the indexing.
|
||||
|
||||
\small
|
||||
[Solution: 01_intro_ex_1a_sol.ipynb](https://www.physi.uni-heidelberg.de/~reygers/lectures/2023/ml/solutions/01_intro_ex_1a_sol.ipynb) \normalsize
|
||||
[Solution: 01_intro_ex_1a_sol.ipynb](https://www.physi.uni-heidelberg.de/~reygers/lectures/2022/ml/solutions/01_intro_ex_1a_sol.ipynb) \normalsize
|
||||
|
||||
ii) Read data which contains pixels from the binary file horse.py into a
|
||||
numpy array. Display the data and the following transformations in 4
|
||||
@ -609,7 +609,7 @@ ii) Read data which contains pixels from the binary file horse.py into a
|
||||
and mirroring.
|
||||
|
||||
\small
|
||||
[Solution: 01_intro_ex_1b_sol.ipynb](https://www.physi.uni-heidelberg.de/~reygers/lectures/2023/ml/solutions/01_intro_ex_1b_sol.ipynb) \normalsize
|
||||
[Solution: 01_intro_ex_1b_sol.ipynb](https://www.physi.uni-heidelberg.de/~reygers/lectures/2022/ml/solutions/01_intro_ex_1b_sol.ipynb) \normalsize
|
||||
|
||||
|
||||
## Pandas
|
||||
|
Binary file not shown.
Binary file not shown.
@ -15,7 +15,7 @@
|
||||
- Machine learning - decision trees
|
||||
|
||||
* **Day 4**
|
||||
- Machine learning - convolutional networks and graph neural networks
|
||||
- Machine learning - convolutional networks graph neural networks
|
||||
|
||||
* **Organization** and **Objective**
|
||||
- \textcolor{red} {2 ETC: Compulsory attendance is required} \newline
|
||||
|
Loading…
Reference in New Issue
Block a user