% Introduction to Data Analysis and Machine Learning in Physics % Jörg Marks, Klaus Reygers % 11-14 April 2023 \newline 9:00 - 12:00 and 14:00 - 17:00 ## Outline * **Day 1** - Introduction, software and data fitting * **Day 2** - Machine learning - basics * **Day 3** - Machine learning - decision trees * **Day 4** - Machine learning - convolutional networks and graph neural networks * **Organization** and **Objective** - \textcolor{red} {2 ETC: Compulsory attendance is required} \newline \textcolor{red} {Active participation in the exercises} - \textcolor{blue}{Course in CIP pool in a tutorial style} - \textcolor{blue}{Obtain basic knowledge for problem-oriented self-studies} ## Course Information (1) * Course requirements - Python knowledge needed / good C++ knowledge might work - Userid to use the CIP Pool of the faculty of physics * Course structure - \textcolor{red}{Course in CIP pool} using the \textcolor{red}{jupyter3 hub} - Lectures are interleaved with tutorial/exercise sessions in small groups (up to 5 persons / group) * Course homepage which includes and distributes all material \small [https://www.physi.uni-heidelberg.de/~reygers/lectures/2023/ml/](https://www.physi.uni-heidelberg.de/~reygers/lectures/2023/ml/) \normalsize /transparencies      \textcolor{blue}{Transparencies of the lectures} /examples             \textcolor{blue}{iPython files shown in the lectures} /exercises             \textcolor{blue}{Exercises to be solved during the course} /solutions             \textcolor{blue}{Solutions of the exercises} ## Course Information (2) `TensorFlow` and `Keras` are now also installed in the CIP jupyter hub. In addition, with a google account you can run jupyter notebooks on Google Colab: \vspace{3ex} [https://colab.research.google.com/](https://colab.research.google.com/) \vfill Missing python libraries can be included by adding the following to a cell (here for the pypng library): ``` !pip install pypng ``` ## Course Information (3) * Your installation at home: * \textcolor{blue}{Web Browser to access jupyter3} * \textcolor{blue}{Access to the CIP pool via an ssh client on your home PC} * No requirements for a special operating system * Software: * firefox or similar * Cisco AnyConnect * ssh client (MobaXterm on Windows, integrated in Linux/Mac) * Local execution of python / iPython * Install ``anaconda3`` and download / run the iPython notebooks (also python scripts are available) * \textcolor{red}{Hints for software installations and CIP pool access} \small [https://www.physi.uni-heidelberg.de/~marks/root_einfuehrung/Folien/CIPpoolAccess.PDF](https://www.physi.uni-heidelberg.de/~marks/root_einfuehrung/Folien/CIPpoolAccess.PDF) \normalsize ## Course Information (4) Alternatively, you can install the libraries needed on your local computer. \vfill Here are the relevant instruction for macOS using `pip`: \vfill Assumptions: `homebrew` is installed. \vfill Install python3 (see https://docs.python-guide.org/starting/install3/osx/) \footnotesize ``` $ brew install python $ python --version Python 3.8.5 ``` \normalsize Make sure pip3 is up-to-date (alternative: conda $\rightarrow$ don't mix conda and pip installations) \footnotesize ``` $ pip3 install --upgrade pip ``` \normalsize Install modules needed: \footnotesize ``` $ pip3 install --upgrade jupyter matplotlib numpy pandas scipy scikit-learn xgboost iminuit tensorflow tensorflow_datasets Keras ``` \normalsize ## Topcics and file name conventions 0. Introduction (this file) \hspace{0.1cm} \footnotesize (\textcolor{gray}{introduction.pdf}) \normalsize 1. Introduction to python \hspace{0.1cm} \footnotesize (\textcolor{gray}{01\_intro\_python\_*}) \normalsize 2. Data modeling and fitting \hspace{0.1cm} \footnotesize (\textcolor{gray}{02\_fit\_intro\_*}) \normalsize 3. Machine learning basics \hspace{0.1cm} \footnotesize (\textcolor{gray}{03\_ml\_basics\_*}) \normalsize 4. Decisions trees \hspace{0.1cm} \footnotesize (\textcolor{gray}{04\_decision\_trees\_*}) \normalsize 5. Neural networks \hspace{0.1cm} \footnotesize (\textcolor{gray}{05\_neural\_networks\_*}) \normalsize \vspace{3.5cm} ## Programm Day 1 * Technicalities * Summary of NumPy * Plotting with matplotlib * Input / output of data * Summary of pandas * Fitting with iminuit and PyROOT * Transparencies with activated links, examples and exercises * Software: [\textcolor{violet}{01\_intro\_python.pdf}](https://www.physi.uni-heidelberg.de/~reygers/lectures/2023/ml/transparencies/01_intro_python.pdf) * Fitting: [\textcolor{violet}{02\_fit\_intro.pdf}](https://www.physi.uni-heidelberg.de/~reygers/lectures/2023/ml/transparencies/02_fit_intro.pdf) \vspace{2cm} ## Programm Day 2 * Introduction to machine learning [\textcolor{violet}{03\_ml\_intro.pdf}](https://www.physi.uni-heidelberg.de/~reygers/lectures/2023/ml/transparencies/03_ml_intro.pdf) * Tensorflow / Keras, datasets * Supervised learning * Classification \vspace{0.5cm} * Multivariate analysis [\textcolor{violet}{03\_ml\_intro\_mva.pdf}](https://www.physi.uni-heidelberg.de/~reygers/lectures/2023/ml/transparencies/03_ml_intro_mva.pdf) * Regression * Linear regression * Logistic regression * Softmax regression (multi-class classification) \vspace{4cm} ## Programm Day 3 * Decision trees * Bagging and boosting * Random forest * XGBoost \vspace{0.5cm} [\textcolor{violet}{04\_decision\_trees.pdf}](https://www.physi.uni-heidelberg.de/~reygers/lectures/2023/ml/transparencies/04_decision_trees.pdf) \vspace{5cm} ## Programm Day 4 * Neural networks * Convolutional neural networks * Hand-written digit recognition with Keras \vspace{0.5cm} [\textcolor{violet}{05\_neural\_networks.pdf}](https://www.physi.uni-heidelberg.de/~reygers/lectures/2023/ml/transparencies/05_neural_networks.pdf) \vspace{5cm}