Compare commits
2 Commits
5f54ce46f2
...
c91b2cdbb7
Author | SHA1 | Date | |
---|---|---|---|
c91b2cdbb7 | |||
ea8d1af517 |
228
notebooks/03_ml_basics_ex_1_magic.ipynb
Normal file
228
notebooks/03_ml_basics_ex_1_magic.ipynb
Normal file
@ -0,0 +1,228 @@
|
||||
{
|
||||
"cells": [
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"# Exercise: Classification of air showers measured with the MAGIC telescope\n",
|
||||
"\n",
|
||||
"The [MAGIC telescope](https://en.wikipedia.org/wiki/MAGIC_(telescope)) is a Cherenkov telescope situated on La Palma, one of the Canary Islands. The [MAGIC machine learning dataset](https://archive.ics.uci.edu/ml/datasets/magic+gamma+telescope) can be obtained from [UC Irvine Machine Learning Repository](https://archive.ics.uci.edu/ml/index.php).\n",
|
||||
"\n",
|
||||
"The task is to separate signal events (gamma showers) and background events (hadron showers) based on the features of a measured Cherenkov shower.\n",
|
||||
"\n",
|
||||
"The features of a shower are:\n",
|
||||
"\n",
|
||||
" 1. fLength: continuous # major axis of ellipse [mm]\n",
|
||||
" 2. fWidth: continuous # minor axis of ellipse [mm] \n",
|
||||
" 3. fSize: continuous # 10-log of sum of content of all pixels [in #phot]\n",
|
||||
" 4. fConc: continuous # ratio of sum of two highest pixels over fSize [ratio]\n",
|
||||
" 5. fConc1: continuous # ratio of highest pixel over fSize [ratio]\n",
|
||||
" 6. fAsym: continuous # distance from highest pixel to center, projected onto major axis [mm]\n",
|
||||
" 7. fM3Long: continuous # 3rd root of third moment along major axis [mm] \n",
|
||||
" 8. fM3Trans: continuous # 3rd root of third moment along minor axis [mm]\n",
|
||||
" 9. fAlpha: continuous # angle of major axis with vector to origin [deg]\n",
|
||||
" 10. fDist: continuous # distance from origin to center of ellipse [mm]\n",
|
||||
" 11. class: g,h # gamma (signal), hadron (background)\n",
|
||||
"\n",
|
||||
"g = gamma (signal): 12332\n",
|
||||
"h = hadron (background): 6688\n",
|
||||
"\n",
|
||||
"For technical reasons, the number of h events is underestimated.\n",
|
||||
"In the real data, the h class represents the majority of the events.\n",
|
||||
"\n",
|
||||
"You can find further information about the MAGIC telescope and the data discrimination studies in the following [paper](https://reader.elsevier.com/reader/sd/pii/S0168900203025051?token=8A02764E2448BDC5E4DD0ED53A301295162A6E9C8F223378E8CF80B187DBFD98BD3B642AB83886944002206EB1688FF4) (R. K. Bock et al., \"Methods for multidimensional event classification: a case studyusing images from a Cherenkov gamma-ray telescope\" NIM A 516 (2004) 511-528) (You need to be within the university network to get free access.) "
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 2,
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"import pandas as pd\n",
|
||||
"import numpy as np\n",
|
||||
"from sklearn.model_selection import train_test_split"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 3,
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"filename = \"https://www.physi.uni-heidelberg.de/~reygers/lectures/2021/ml/data/magic04_data.txt\"\n",
|
||||
"df = pd.read_csv(filename, engine='python')"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"# use categories 1 and 0 insted of \"g\" and \"h\"\n",
|
||||
"df['class'] = df['class'].map({'g': 1, 'h': 0})"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"df.head()"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"#### a) Create for each variable a figure with a plot for gammas and hadrons overlayed."
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"import matplotlib.pyplot as plt"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"df0 = df[df['class'] == 0] # hadron data set\n",
|
||||
"df1 = df[df['class'] == 1] # gamma data set\n",
|
||||
"\n",
|
||||
"print(len(df0),len(df1))\n",
|
||||
"\n",
|
||||
"### YOUR CODE ###\n",
|
||||
"\n"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"#### b) Create training and test data set. The tast data should amount to 50\\% of the total data set."
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"y = df['class'].values\n",
|
||||
"X = df[[col for col in df.columns if col!=\"class\"]]\n",
|
||||
"\n",
|
||||
"### YOUR CODE ### \n",
|
||||
"\n"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"#### c) Define the logistic regressor and fit the training data"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"from sklearn import linear_model\n",
|
||||
"\n",
|
||||
"# define logistic regressor\n",
|
||||
"\n",
|
||||
"### YOUR CODE ###\n",
|
||||
"\n",
|
||||
"\n",
|
||||
"\n",
|
||||
"# fit training data\n",
|
||||
"\n",
|
||||
"### YOUR CODE ###\n",
|
||||
"\n"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"#### d) Determine the Model Accuracy, the AUC score and the Run time"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"from sklearn.metrics import roc_auc_score\n",
|
||||
"\n",
|
||||
"### YOUR CODE ###\n",
|
||||
"\n"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"#### e) Plot the ROC curve (Backgropund Rejection vs signal efficiency)"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"import matplotlib.pyplot as plt\n",
|
||||
"from sklearn.metrics import roc_curve\n",
|
||||
"%matplotlib inline\n",
|
||||
"\n",
|
||||
"y_pred_prob = logreg.predict_proba(X_test) # predicted probabilities\n",
|
||||
"\n",
|
||||
"### YOUR CODE ###\n",
|
||||
"\n"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"### YOUR CODE ###\n",
|
||||
"\n"
|
||||
]
|
||||
}
|
||||
],
|
||||
"metadata": {
|
||||
"kernelspec": {
|
||||
"display_name": "Python 3",
|
||||
"language": "python",
|
||||
"name": "python3"
|
||||
},
|
||||
"language_info": {
|
||||
"codemirror_mode": {
|
||||
"name": "ipython",
|
||||
"version": 3
|
||||
},
|
||||
"file_extension": ".py",
|
||||
"mimetype": "text/x-python",
|
||||
"name": "python",
|
||||
"nbconvert_exporter": "python",
|
||||
"pygments_lexer": "ipython3",
|
||||
"version": "3.8.5"
|
||||
}
|
||||
},
|
||||
"nbformat": 4,
|
||||
"nbformat_minor": 4
|
||||
}
|
383
notebooks/03_ml_basics_iris_softmax_regression.ipynb
Normal file
383
notebooks/03_ml_basics_iris_softmax_regression.ipynb
Normal file
File diff suppressed because one or more lines are too long
502
notebooks/03_ml_basics_log_regr_heart_disease.ipynb
Normal file
502
notebooks/03_ml_basics_log_regr_heart_disease.ipynb
Normal file
File diff suppressed because one or more lines are too long
195
notebooks/03_ml_basics_logistic_regression.ipynb
Normal file
195
notebooks/03_ml_basics_logistic_regression.ipynb
Normal file
File diff suppressed because one or more lines are too long
@ -601,7 +601,7 @@ i) Display a numpy array as figure of a blue cross. The size should be 200
|
||||
selects the inner part of the circle using the indexing.
|
||||
|
||||
\small
|
||||
[Solution: 01_intro_ex_1a_sol.ipynb](https://www.physi.uni-heidelberg.de/~reygers/lectures/2022/ml/solutions/01_intro_ex_1a_sol.ipynb) \normalsize
|
||||
[Solution: 01_intro_ex_1a_sol.ipynb](https://www.physi.uni-heidelberg.de/~reygers/lectures/2023/ml/solutions/01_intro_ex_1a_sol.ipynb) \normalsize
|
||||
|
||||
ii) Read data which contains pixels from the binary file horse.py into a
|
||||
numpy array. Display the data and the following transformations in 4
|
||||
@ -609,7 +609,7 @@ ii) Read data which contains pixels from the binary file horse.py into a
|
||||
and mirroring.
|
||||
|
||||
\small
|
||||
[Solution: 01_intro_ex_1b_sol.ipynb](https://www.physi.uni-heidelberg.de/~reygers/lectures/2022/ml/solutions/01_intro_ex_1b_sol.ipynb) \normalsize
|
||||
[Solution: 01_intro_ex_1b_sol.ipynb](https://www.physi.uni-heidelberg.de/~reygers/lectures/2023/ml/solutions/01_intro_ex_1b_sol.ipynb) \normalsize
|
||||
|
||||
|
||||
## Pandas
|
||||
|
Binary file not shown.
Binary file not shown.
@ -15,7 +15,7 @@
|
||||
- Machine learning - decision trees
|
||||
|
||||
* **Day 4**
|
||||
- Machine learning - convolutional networks graph neural networks
|
||||
- Machine learning - convolutional networks and graph neural networks
|
||||
|
||||
* **Organization** and **Objective**
|
||||
- \textcolor{red} {2 ETC: Compulsory attendance is required} \newline
|
||||
|
Loading…
Reference in New Issue
Block a user