Merge branch 'master' of https://git.physi.uni-heidelberg.de/Teaching/ML-Kurs-SS2023
100494
03_ml_basics_simple_neural_network.ipynb
Executable file
@ -1,6 +1,7 @@
|
|||||||
{
|
{
|
||||||
"cells": [
|
"cells": [
|
||||||
{
|
{
|
||||||
|
"attachments": {},
|
||||||
"cell_type": "markdown",
|
"cell_type": "markdown",
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"source": [
|
"source": [
|
||||||
@ -23,6 +24,7 @@
|
|||||||
]
|
]
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
|
"attachments": {},
|
||||||
"cell_type": "markdown",
|
"cell_type": "markdown",
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"source": [
|
"source": [
|
||||||
@ -85,6 +87,7 @@
|
|||||||
]
|
]
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
|
"attachments": {},
|
||||||
"cell_type": "markdown",
|
"cell_type": "markdown",
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"source": [
|
"source": [
|
||||||
@ -336,7 +339,7 @@
|
|||||||
"name": "python",
|
"name": "python",
|
||||||
"nbconvert_exporter": "python",
|
"nbconvert_exporter": "python",
|
||||||
"pygments_lexer": "ipython3",
|
"pygments_lexer": "ipython3",
|
||||||
"version": "3.8.16"
|
"version": "3.10.9"
|
||||||
},
|
},
|
||||||
"vscode": {
|
"vscode": {
|
||||||
"interpreter": {
|
"interpreter": {
|
||||||
|
451
notebooks/04_decision_trees_critical_temp_regression.ipynb
Normal file
165
slides/05_GNNs.md
Normal file
@ -0,0 +1,165 @@
|
|||||||
|
## Graph Neural Networks
|
||||||
|
|
||||||
|
::: columns
|
||||||
|
:::: {.column width=65%}
|
||||||
|
* Graph Neural Networks (GNNs): Neural Networks that operate on graph structured data
|
||||||
|
* Graph: consists of nodes that can be connected by edges, edges can be directed or undirected
|
||||||
|
* no grid structure as given for CNNs
|
||||||
|
* node features and edge features possible
|
||||||
|
* relation often represented by adjacency matrix: $A_{ij}=1$ if there is a link between node $i$ and $j$, else 0
|
||||||
|
* tasks on node level, edge level and graph level
|
||||||
|
* full lecture: \url{https://web.stanford.edu/class/cs224w/}
|
||||||
|
::::
|
||||||
|
:::: {.column width=35%}
|
||||||
|
\begin{center}
|
||||||
|
\includegraphics[width=1.1\textwidth]{figures/graph_example.png}
|
||||||
|
\normalsize
|
||||||
|
\end{center}
|
||||||
|
::::
|
||||||
|
:::
|
||||||
|
|
||||||
|
## Simple Example: Zachary's karate club
|
||||||
|
|
||||||
|
::: columns
|
||||||
|
:::: {.column width=60%}
|
||||||
|
* link: \url{https://en.wikipedia.org/wiki/Zachary's_karate_club}
|
||||||
|
* 34 nodes: each node represents a member of the karate club
|
||||||
|
* 4 classes: a community each member belongs to
|
||||||
|
* task: classify the nodes
|
||||||
|
* many real world problems for GNNs exist, e.g.\ social networks, molecules, recommender systems, particle tracks
|
||||||
|
::::
|
||||||
|
:::: {.column width=40%}
|
||||||
|
\begin{center}
|
||||||
|
\includegraphics[width=1.\textwidth]{figures/karateclub.png}
|
||||||
|
\normalsize
|
||||||
|
\end{center}
|
||||||
|
::::
|
||||||
|
:::
|
||||||
|
|
||||||
|
## From CNN to GNN
|
||||||
|
|
||||||
|
\begin{center}
|
||||||
|
\includegraphics[width=0.8\textwidth]{figures/fromCNNtoGNN.png}
|
||||||
|
\normalsize
|
||||||
|
\newline
|
||||||
|
\tiny (from Stanford GNN lecture)
|
||||||
|
\end{center}
|
||||||
|
\normalsize
|
||||||
|
* GNN: Generalization of convolutional neural network
|
||||||
|
* No grid structure, arbitrary number of neighbors defined by adjacency matrix
|
||||||
|
* Operations pass information from neighborhood
|
||||||
|
|
||||||
|
## Architecture: Graph Convolutional Network
|
||||||
|
|
||||||
|
::: columns
|
||||||
|
:::: {.column width=60%}
|
||||||
|
* Message passing from connected nodes
|
||||||
|
* The graph convolution is defined as:
|
||||||
|
$$ H^{(l+1)} = \sigma \left( \tilde{D}^{\frac{1}{2}} \tilde{A} \tilde{D}^{-\frac{1}{2}} H^{(l)} W^{(l)} \right)$$
|
||||||
|
* The adjacency matrix $A$ including self-connections is given by $\tilde{A}$
|
||||||
|
* The degree matrix of the corrected adjacency matrix is given by $\tilde{D}_{ii} = \Sigma_j \tilde{A}_{ij}$
|
||||||
|
* The weights of the given layer are called $W^{(l)}$
|
||||||
|
* $H^{(l)}$ is the matrix for activations in layer $l$
|
||||||
|
::::
|
||||||
|
:::: {.column width=40%}
|
||||||
|
\begin{center}
|
||||||
|
\includegraphics[width=1.1\textwidth]{figures/GCN.png}
|
||||||
|
\normalsize
|
||||||
|
\end{center}
|
||||||
|
\tiny \url{https://arxiv.org/abs/1609.02907}
|
||||||
|
::::
|
||||||
|
:::
|
||||||
|
|
||||||
|
|
||||||
|
## Architecture: Graph Attention Network
|
||||||
|
|
||||||
|
::: columns
|
||||||
|
:::: {.column width=50%}
|
||||||
|
* Calculate the attention coefficients $e_{ij}$ from the features $\vec{h}$ for each node $i$ with its neighbors $j$
|
||||||
|
$$ e_{ij} = a\left( W\vec{h}_i, W\vec{h}_j \right)$$
|
||||||
|
$a$: learnable weight vector
|
||||||
|
* Normalize attention coefficients
|
||||||
|
$$ \alpha_{ij} = \text{softmax}_j(e_{ij}) = \frac{\text{exp}(e_{ij})}{\Sigma_k \text{exp}(e_{ik})} $$
|
||||||
|
* Calculate node features
|
||||||
|
$$
|
||||||
|
\vec{h}^{(l+1)}_i = \sigma \left( \Sigma \alpha_{ij} W \vec{h}^l_j \right)$$
|
||||||
|
::::
|
||||||
|
:::: {.column width=50%}
|
||||||
|
\begin{center}
|
||||||
|
\includegraphics[width=1.1\textwidth]{figures/GraphAttention.png}
|
||||||
|
\normalsize
|
||||||
|
\end{center}
|
||||||
|
\tiny \url{https://arxiv.org/abs/1710.10903}
|
||||||
|
::::
|
||||||
|
:::
|
||||||
|
|
||||||
|
## Example: Identification of inelastic interactions in TRD
|
||||||
|
|
||||||
|
|
||||||
|
::: columns
|
||||||
|
:::: {.column width=60%}
|
||||||
|
* Identification of inelastic interactions of light antinuclei
|
||||||
|
in the Transition Radiation Detector in ALICE
|
||||||
|
* Thesis: \url{https://www.physi.uni-heidelberg.de/Publications/Bachelor_Thesis_Maximilian_Hammermann.pdf}
|
||||||
|
* Construct nearest neighbor graph from signals in detector
|
||||||
|
* Use global pooling for graph classification
|
||||||
|
::::
|
||||||
|
:::: {.column width=40%}
|
||||||
|
|
||||||
|
interaction of antideuteron:
|
||||||
|
|
||||||
|
\begin{center}
|
||||||
|
\includegraphics[width=0.8\textwidth]{figures/antideuteronsgnMax.png}
|
||||||
|
\normalsize
|
||||||
|
\end{center}
|
||||||
|
::::
|
||||||
|
:::
|
||||||
|
|
||||||
|
|
||||||
|
\begin{center}
|
||||||
|
\includegraphics[width=0.9\textwidth]{figures/GNN_conf.png}
|
||||||
|
\normalsize
|
||||||
|
\end{center}
|
||||||
|
|
||||||
|
|
||||||
|
|
||||||
|
## Example: Google Maps
|
||||||
|
|
||||||
|
* link: \url{https://www.deepmind.com/blog/traffic-prediction-with-advanced-graph-neural-networks}
|
||||||
|
* GNNs are used for traffic predictions and estimated times of arrival (ETAs)
|
||||||
|
|
||||||
|
\begin{center}
|
||||||
|
\includegraphics[width=0.8\textwidth]{figures/GNNgooglemaps.png}
|
||||||
|
\normalsize
|
||||||
|
\end{center}
|
||||||
|
|
||||||
|
|
||||||
|
## Example: Alpha Fold
|
||||||
|
* link: \url{https://www.deepmind.com/blog/alphafold-a-solution-to-a-50-year-old-grand-challenge-in-biology}
|
||||||
|
* "A folded protein can be thought of as a 'spatial graph', where residues are the nodes and edges connect the residues in close proximity"
|
||||||
|
|
||||||
|
\begin{center}
|
||||||
|
\includegraphics[width=0.9\textwidth]{figures/alphafold.png}
|
||||||
|
\normalsize
|
||||||
|
\end{center}
|
||||||
|
|
||||||
|
## Exercise 1: Illustration of Graphs and Graph Neural Networks
|
||||||
|
|
||||||
|
On the PyTorch webpage, you can find official examples for the application of Graph Neural Networks:
|
||||||
|
https://pytorch-geometric.readthedocs.io/en/latest/get_started/colabs.html
|
||||||
|
|
||||||
|
\vspace{3ex}
|
||||||
|
|
||||||
|
The first introduction notebook shows the functionality of graphs with the example of the Karate Club. Follow and reproduce the first [\textcolor{green}{notebook}](https://colab.research.google.com/drive/1h3-vJGRVloF5zStxL5I0rSy4ZUPNsjy8?usp=sharing). Study and understand the data format.
|
||||||
|
|
||||||
|
\vspace{3ex}
|
||||||
|
|
||||||
|
At the end, the separation power of Graph Convolutional Networks (GCN) are shown via the node embeddings. You can replace the GCN with a Graph Attention Layers and compare the results.
|
||||||
|
|
||||||
|
|
||||||
|
|
||||||
|
|
||||||
|
|
||||||
|
|
||||||
|
|
||||||
|
|
BIN
slides/figures/GCN.png
Normal file
After Width: | Height: | Size: 90 KiB |
BIN
slides/figures/GNN_conf.png
Normal file
After Width: | Height: | Size: 56 KiB |
BIN
slides/figures/GNNgooglemaps.png
Normal file
After Width: | Height: | Size: 113 KiB |
BIN
slides/figures/GraphAttention.png
Normal file
After Width: | Height: | Size: 114 KiB |
BIN
slides/figures/Zachary_karate_club_social_network.png
Normal file
After Width: | Height: | Size: 468 KiB |
BIN
slides/figures/alphafold.png
Normal file
After Width: | Height: | Size: 450 KiB |
BIN
slides/figures/antideuteronsgnMax.png
Normal file
After Width: | Height: | Size: 16 KiB |
BIN
slides/figures/fromCNNtoGNN.png
Normal file
After Width: | Height: | Size: 54 KiB |
BIN
slides/figures/graph_example.png
Normal file
After Width: | Height: | Size: 26 KiB |
BIN
slides/figures/karateclub.png
Normal file
After Width: | Height: | Size: 47 KiB |