add files

updates
2023-04-05 17:35:33 +02:00 · 2023-04-05 17:26:47 +02:00
8 changed files with 1311 additions and 3 deletions
--- a/notebooks/03_ml_basics_ex_1_magic.ipynb
+++ b/notebooks/03_ml_basics_ex_1_magic.ipynb
@ -0,0 +1,228 @@
+{
+ "cells": [
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "# Exercise: Classification of air showers measured with the MAGIC telescope\n",
+    "\n",
+    "The [MAGIC telescope](https://en.wikipedia.org/wiki/MAGIC_(telescope)) is a Cherenkov telescope situated on La Palma, one of the Canary Islands. The [MAGIC machine learning dataset](https://archive.ics.uci.edu/ml/datasets/magic+gamma+telescope) can be obtained from [UC Irvine Machine Learning Repository](https://archive.ics.uci.edu/ml/index.php).\n",
+    "\n",
+    "The task is to separate signal events (gamma showers) and background events (hadron showers) based on the features of a measured Cherenkov shower.\n",
+    "\n",
+    "The features of a shower are:\n",
+    "\n",
+    "    1.  fLength:  continuous  # major axis of ellipse [mm]\n",
+    "    2.  fWidth:   continuous  # minor axis of ellipse [mm] \n",
+    "    3.  fSize:    continuous  # 10-log of sum of content of all pixels [in #phot]\n",
+    "    4.  fConc:    continuous  # ratio of sum of two highest pixels over fSize  [ratio]\n",
+    "    5.  fConc1:   continuous  # ratio of highest pixel over fSize  [ratio]\n",
+    "    6.  fAsym:    continuous  # distance from highest pixel to center, projected onto major axis [mm]\n",
+    "    7.  fM3Long:  continuous  # 3rd root of third moment along major axis  [mm] \n",
+    "    8.  fM3Trans: continuous  # 3rd root of third moment along minor axis  [mm]\n",
+    "    9.  fAlpha:   continuous  # angle of major axis with vector to origin [deg]\n",
+    "    10. fDist:    continuous  # distance from origin to center of ellipse [mm]\n",
+    "    11. class:    g,h         # gamma (signal), hadron (background)\n",
+    "\n",
+    "g = gamma (signal):     12332\n",
+    "h = hadron (background): 6688\n",
+    "\n",
+    "For technical reasons, the number of h events is underestimated.\n",
+    "In the real data, the h class represents the majority of the events.\n",
+    "\n",
+    "You can find further information about the MAGIC telescope and the data discrimination studies in the following [paper](https://reader.elsevier.com/reader/sd/pii/S0168900203025051?token=8A02764E2448BDC5E4DD0ED53A301295162A6E9C8F223378E8CF80B187DBFD98BD3B642AB83886944002206EB1688FF4)  (R. K. Bock et al., \"Methods for multidimensional event classification: a case studyusing images from a Cherenkov gamma-ray telescope\" NIM A  516 (2004) 511-528) (You need to be within the university network to get free access.) "
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 2,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "import pandas as pd\n",
+    "import numpy as np\n",
+    "from sklearn.model_selection import train_test_split"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 3,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "filename = \"https://www.physi.uni-heidelberg.de/~reygers/lectures/2021/ml/data/magic04_data.txt\"\n",
+    "df = pd.read_csv(filename, engine='python')"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "# use categories 1 and 0 insted of \"g\" and \"h\"\n",
+    "df['class'] = df['class'].map({'g': 1, 'h': 0})"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "df.head()"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "#### a) Create for each variable a figure with a plot for gammas and hadrons overlayed."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "import matplotlib.pyplot as plt"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "df0 = df[df['class'] == 0] # hadron data set\n",
+    "df1 = df[df['class'] == 1] # gamma data set\n",
+    "\n",
+    "print(len(df0),len(df1))\n",
+    "\n",
+    "### YOUR CODE ###\n",
+    "\n"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "#### b) Create training and test data set. The tast data should amount to 50\\% of the total data set."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "y = df['class'].values\n",
+    "X = df[[col for col in df.columns if col!=\"class\"]]\n",
+    "\n",
+    "### YOUR CODE ### \n",
+    "\n"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "#### c) Define the logistic regressor and fit the training data"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "from sklearn import linear_model\n",
+    "\n",
+    "# define logistic regressor\n",
+    "\n",
+    "### YOUR CODE ###\n",
+    "\n",
+    "\n",
+    "\n",
+    "# fit training data\n",
+    "\n",
+    "### YOUR CODE ###\n",
+    "\n"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "#### d) Determine the Model Accuracy, the AUC score and the Run time"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "from sklearn.metrics import roc_auc_score\n",
+    "\n",
+    "### YOUR CODE ###\n",
+    "\n"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "#### e) Plot the ROC curve (Backgropund Rejection vs signal efficiency)"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "import matplotlib.pyplot as plt\n",
+    "from sklearn.metrics import roc_curve\n",
+    "%matplotlib inline\n",
+    "\n",
+    "y_pred_prob = logreg.predict_proba(X_test) # predicted probabilities\n",
+    "\n",
+    "### YOUR CODE ###\n",
+    "\n"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "### YOUR CODE ###\n",
+    "\n"
+   ]
+  }
+ ],
+ "metadata": {
+  "kernelspec": {
+   "display_name": "Python 3",
+   "language": "python",
+   "name": "python3"
+  },
+  "language_info": {
+   "codemirror_mode": {
+    "name": "ipython",
+    "version": 3
+   },
+   "file_extension": ".py",
+   "mimetype": "text/x-python",
+   "name": "python",
+   "nbconvert_exporter": "python",
+   "pygments_lexer": "ipython3",
+   "version": "3.8.5"
+  }
+ },
+ "nbformat": 4,
+ "nbformat_minor": 4
+}
--- a/notebooks/03_ml_basics_iris_softmax_regression.ipynb
+++ b/notebooks/03_ml_basics_iris_softmax_regression.ipynb
--- a/notebooks/03_ml_basics_log_regr_heart_disease.ipynb
+++ b/notebooks/03_ml_basics_log_regr_heart_disease.ipynb
--- a/notebooks/03_ml_basics_logistic_regression.ipynb
+++ b/notebooks/03_ml_basics_logistic_regression.ipynb
--- a/slides/01_intro_python.md
+++ b/slides/01_intro_python.md
@ -601,7 +601,7 @@ i) Display a numpy array as figure of a blue cross. The size should be 200
     selects the inner part of the circle using the indexing.
     
   \small
-   [Solution:  01_intro_ex_1a_sol.ipynb](https://www.physi.uni-heidelberg.de/~reygers/lectures/2022/ml/solutions/01_intro_ex_1a_sol.ipynb) \normalsize
+   [Solution:  01_intro_ex_1a_sol.ipynb](https://www.physi.uni-heidelberg.de/~reygers/lectures/2023/ml/solutions/01_intro_ex_1a_sol.ipynb) \normalsize
   
 ii) Read data which contains pixels from the binary file horse.py into a
    numpy array. Display the data and the following transformations in 4
@ -609,7 +609,7 @@ ii) Read data which contains pixels from the binary file horse.py into a
    and mirroring.
    
    \small
-    [Solution: 01_intro_ex_1b_sol.ipynb](https://www.physi.uni-heidelberg.de/~reygers/lectures/2022/ml/solutions/01_intro_ex_1b_sol.ipynb) \normalsize 
+    [Solution: 01_intro_ex_1b_sol.ipynb](https://www.physi.uni-heidelberg.de/~reygers/lectures/2023/ml/solutions/01_intro_ex_1b_sol.ipynb) \normalsize 
      

 ## Pandas
--- a/slides/03_ml_intro.odp
+++ b/slides/03_ml_intro.odp
--- a/slides/03_ml_intro.pdf
+++ b/slides/03_ml_intro.pdf
--- a/slides/introduction.md
+++ b/slides/introduction.md
@ -15,7 +15,7 @@
   - Machine learning - decision trees

 * **Day 4**
-   - Machine learning - convolutional networks graph neural networks
+   - Machine learning - convolutional networks and graph neural networks

 * **Organization** and **Objective**
   -  \textcolor{red} {2 ETC: Compulsory attendance is required} \newline
Author	SHA1	Message	Date
Joerg Marks	c91b2cdbb7	add files	2023-04-05 17:35:33 +02:00
Joerg Marks	ea8d1af517	updates	2023-04-05 17:26:47 +02:00