Evaluate the importance of each of the $n$ features in the training of the XGBoost classifier for the MAGIC data set by dropping one of the features. This gives $n$ different classifiers. Compare the performance of these classifiers using the AUC score.
## Exercise 4: Interpret a classifier with SHAP values
SHAP (SHapley Additive exPlanations) are a means to explain the output of any machine learning model. [Shapeley values](https://en.wikipedia.org/wiki/Shapley_value) are a concept that is used in cooperative game theory. They are named after Lloyd Shapley who won the Nobel Prize in Economics in 2012.
\vfill
Use the Python library [`SHAP`](https://shap.readthedocs.io/en/latest/index.html) to quantify the feature importance.
a) Study the documentation at [https://shap.readthedocs.io/en/latest/tabular_examples.html](https://shap.readthedocs.io/en/latest/tabular_examples.html)
b) Create a summary plot of the feature importance in the MAGIC data set with `shap.summary_plot` for the XGBoost classifier of exercise 2. What are the three most important features?
c) Do the same for the superconductivity data set? What are the three most important features?
@ -39,7 +27,7 @@ $$h(\vec x) = \begin{cases}1 & \text{if }\ \vec w \cdot \vec x + b > 0,\\0 & \te
\textcolor{gray}{The perceptron was designed for image recognition. It was first implemented in hardware (400 photocells, weights = potentiometer settings).}
\normalsize
## Perceptron (2)
## Historical perspective: Perceptron (2)
::: columns
:::: {.column width=60%}
* McCulloch–Pitts (MCP) neuron (1943)
@ -73,52 +61,6 @@ it cannot learn the XOR function \newline
a) Can you find better hyperparameters (number of hidden layers, neurons per layer, loss function, ...)? Try this first by hand.
b) Now use [\textcolor{gray}{sklearn.model\_selection.GridSearchCV}](https://scikit-learn.org/stable/modules/generated/sklearn.model_selection.GridSearchCV.html) to find optimal parameters.
## TensorFlow
::: columns
:::: {.column width=70%}
* Powerful open source library with a focus on deep neural networks
* Performs computations of data flow graphs
* Takes care of computing gradients of the defined functions (\textit{automatic differentiation})
* Computations in parallel on multiple CPUs or GPUs
The multi-layer perceptron did not perform well on the superconductivity dataset. Can you find better hyperparameters (number of hidden layers, neurons per layer, loss function, learning rate...)?
Use [\textcolor{gray}{sklearn.model\_selection.GridSearchCV}](https://scikit-learn.org/stable/modules/generated/sklearn.model_selection.GridSearchCV.html) to find optimal parameters.
Can you summarize the inductive bias that is associated with a convolutional neural network? \newline
\vspace{0.5ex}
ChatGPT Mar 23 Version: \newline
\vspace{0.5ex}
\small "Yes, of course! The inductive bias associated with a convolutional neural network (CNN) is that it assumes that the input data has a grid-like topology, such as images, audio spectrograms or video frames.
CNNs are designed to learn local and translation-invariant features by convolving small filters over the input data. This allows them to identify patterns in the input that are useful for classification or other tasks. The filters can be learned through backpropagation, and the output of each convolutional layer is usually downsampled (e.g., through max pooling) to reduce the dimensionality of the data.
The inductive bias of CNNs allows them to perform well on tasks involving images, audio, and video data, where the local structure of the data is important. However, it may not be as effective for other types of data that do not have a grid-like topology, such as text or graphs. In these cases, other types of neural networks or machine learning algorithms may be more appropriate."
\normalsize
## MNIST classification with a CNN in Keras
\footnotesize
```python
from keras.models import Sequential
from keras.layers import Dense, Flatten, MaxPooling2D, Conv2D, Input, Dropout
from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import Dense, Flatten, MaxPooling2D, Conv2D, Input
@ -684,46 +448,177 @@ a) Plot training and validation loss as well as training and validation accuracy
b) Determine the accuracy of the fully trained model.
c) Create a second notebook that reads the trained model (`mnist_keras_model.h5`). Read `your_own_digit.png` and classify it. Create your own $28 \times 28$ pixel digits with a program like gimp and check how the model performs.
c) Create a second notebook that reads the trained model (`mnist_keras_model.h5`). Read `your_own_digit.png` and classify it. Create your own $28 \times 28$ pixel digits with a program like gimp and check how the model performs.
## Exercise 5: Higgs data set (1)
d) Try to improve the performance of the network by increasing the number of filters and by adding a second convolutional layer.
Application of deep neural networks for separation of signal and background in an exotic Higgs scenario
## Graph Neural Networks
\vfill
\tiny \textcolor{gray}{slides on GNNs by Martin Kroesen} \normalsize
\small
\color{gray}
In this exercise we want to explore various techniques to optimize the event selection in the search for supersymmetric Higgs bosons at the LHC. In supersymmetry the Higgs sector constitutes of five Higgs bosons in contrast to the single Higgs in the standard model. Here we deal with a heavy Higgs boson which decays into two W-bosons and a standard Higgs boson ($H^0 \to W^+ W^- h$) which decay further into leptons ($W^\pm \to l^\pm \nu$) and b-quarks ($h\to b \bar{b}$) respectively.
::: columns
:::: {.column width=65%}
* Graph Neural Networks (GNNs): Neural Networks that operate on graph structured data
* Graph: consists of nodes that can be connected by edges, edges can be directed or undirected
* no grid structure as given for CNNs
* node features and edge features possible
* relation often represented by adjacency matrix: $A_{ij}=1$ if there is a link between node $i$ and $j$, else 0
* tasks on node level, edge level and graph level
* full lecture: \url{https://web.stanford.edu/class/cs224w/}
This exercise is based on a [Nature paper](https://www.nature.com/articles/ncomms5308) (Pierre Baldi, Peter Sadowski, Daniel Whiteson) which contains much more information like general background information, details about the selection variables and links to large sets of simulated events. You might also use the paper as inspiration for the solution of this exercise.
The first introduction notebook shows the functionality of graphs with the example of the Karate Club. Follow and reproduce the first [\textcolor{green}{notebook}](https://colab.research.google.com/drive/1h3-vJGRVloF5zStxL5I0rSy4ZUPNsjy8?usp=sharing). Study and understand the data format.
\vspace{3ex}
At the end, the separation power of Graph Convolutional Networks (GCN) are shown via the node embeddings. You can replace the GCN with a Graph Attention Layers and compare the results.
## Exercise 4: Classifying molecules
The PyTorch Geometric webpage also provides an example of the classification of molecules:
Study this notebook and then modify it to apply it to the [PROTEINS dataset](https://pytorch-geometric.readthedocs.io/en/latest/cheatsheet/data_cheatsheet.html).