ML-Kurs-SS2023/notebooks/03_ml_basics_simple_neural_network.ipynb

351 lines
34 KiB
Plaintext
Raw Permalink Normal View History

2023-04-03 12:26:38 +02:00
{
"cells": [
{
"cell_type": "markdown",
"metadata": {},
"source": [
"# A simple neural network with one hidden layer in pure Python\n",
"\n",
"## Introduction\n",
"We consider a simple feed-forward neural network with one hidden layer:"
]
},
{
"attachments": {
"48b1ed6e-8e2b-4883-82ac-a2bbed6e2885.png": {
"image/png": "iVBORw0KGgoAAAANSUhEUgAAASwAAAEsCAYAAAB5fY51AAAAOXRFWHRTb2Z0d2FyZQBNYXRwbG90bGliIHZlcnNpb24zLjYuMCwgaHR0cHM6Ly9tYXRwbG90bGliLm9yZy89olMNAAAACXBIWXMAAA9hAAAPYQGoP6dpAABEKklEQVR4nO2ddVxU6ffHD62EgYWBhd2K2F2767prrmLHioqBrWv3rh1rdxeIioGN3WAuusbaIqlISc39/P7gN8/XkYY7DHc879drXvta5855zjzc+5mnzjkGAEAMwzAKwFDXDjAMw6QVFiyGYRQDCxbDMIqBBYthGMXAgsUwjGJgwWIYRjGwYDEMoxhYsBiGUQwsWAzDKAYWLIZhFAMLFsMwioEFi2EYxcCCxTCMYmDBYhhGMbBgMQyjGFiwGIZRDCxYDMMoBhYshmEUAwsWwzCKgQWLYRjFwILFMIxiYMFiGEYxsGAxDKMYWLAYhlEMLFgMwygGFiyGYRQDCxbDMIqBBYthGMXAgsUwjGJgwWIYRjGwYDEMoxhYsBiGUQwsWAzDKAYWLIZhFAMLFsMwioEFi2EYxcCCxTCMYmDBYhhGMbBgMQyjGFiwGIZRDCxYDMMoBhYshmEUAwsWwzCKgQWLYRjFwILFMIxiMNa1A0zqxMTEkL+/P8XHx5OVlRUVKFCADAwMdO2WXsN9nj3hEVY25dGjRzRq1CiqWbMmWVpaUsmSJalMmTJUqFAhKly4MLVr14527NhB0dHRunZVb+A+z/4YAICunWD+x7Nnz2jYsGF05swZKliwILVr147s7e2pTJkyZGxsTB8/fqR79+7RlStX6OLFi2RtbU3Tp0+nESNGkKEh//5kBO5zBQEm27B69WrkzJkTpUuXxp49exATE5Pi9U+fPsWgQYNARGjUqBHevn2bRZ7qD9znyoIFK5swbdo0EBGGDh2KiIiIdH32woULsLW1RfHixfHixQsteah/cJ8rDxasbMD69etBRFiwYEGGbbx9+xZ2dnYoW7YswsPDZfROP+E+VyYsWDrmv//+g7m5OQYPHpxpW8+ePYO5uTmGDh0qg2f6C/e5cmHB0jHt2rVDyZIlZfuFXrlyJYgIt2/flsWePsJ9rlxYsHTIs2fPQETYsmWLbDbj4+NRokQJ9OnTRzab+gT3ubLhPVkdsmXLFsqbNy85OjrKZtPIyIicnZ1p//79FBYWJptdfYH7XNmwYOmQq1evUqtWrShnzpyy2m3Xrh3FxMSQj4+PrHb1Ae5zZcOCpSMkSaI7d+5Q7dq1ZbddoUIFMjc354fnG7jPlQ8Llo6IiIigiIgIKlGihOy2jYyMyNbWlj58+CC7bSXDfa58WLB0BP4/IkpbAbWGhoaiDSYB7nPlw4KlIywsLMjU1JQCAgJktw2AAgICyNraWnbbSob7XPmwYOkIY2Njqlq1qlbWPF69ekUfP36kmjVrym5byXCfKx8WLB1St25dOn/+PKlUKlntenl5kYGBgVYWl5UO97myYcHSIX369KE3b97QiRMnZLMJgP7++28qV66c7Fv3+oC2+nzNmjX0448/UqFChWSzyySG82HpEADk4OBAxsbGdPXqVTIyMsq0zePHj1O7du2IiChv3rw0ZswYcnFxoVy5cmXatj6gzT4/duwY/fzzzzJ4ySSLbg7YM2quXLkCAwMDLFq0KNO2Pn36hCJFiqB69eooV64ciAhEhDx58mDWrFkIDQ2VwWPlo40+//HHHyFJkgzeMSnBgpUNGDt2LExMTHD8+PEM24iKikLLli2RO3duvH37FvHx8dizZw8qVKigIVwzZ87Ep0+f5HNeoWijzxntw4KVDYiNjUX79u1hYmKCzZs3p/uX+t27d2jcuDFy5syJixcvarwXHx+PvXv3olKlSkK4cufOjenTp+Pjx49yfg1FIUefN2nSJMk+Z7QHC1Y2ITY2FgMGDAARoW3btvj3339T/Ux0dDTWr18PS0tL5MiRAxcuXEj2WpVKhf3796Ny5cpCuHLlyoVp06YhJCREzq+iGDLT57lz54aNjQ2uXLmSBZ4yaliwshlHjhxBkSJFQERo2bIlVq1ahevXryM4OBifP3/Gixcv4O7ujjFjxiBPnjwgIpiZmYGIMHfu3FTtq1QquLm5oUqVKkK4rKysMGXKFAQHB8v+fVQqFaKiohAbGyu7bblIT59bW1uDiFCqVKnvVuh1CQtWNuTLly/YuXMnGjduDBMTEyEsX7+MjY3FKGnz5s0gIpiYmOD+/ftpakOlUuHAgQOoVq2asGlpaYlJkyYhKCgow76rVCqcOXMGTk5OqFmzpob/NjY2aNu2LRYsWICAgIAMt6EN0tLnNjY26NGjB4gIBgYG8PX11bXb3x18rCGbExMTQw8fPqQ3b96QSqUiKysrCg0Npe7du5ORkRGpVCratm0bHTp0iDw8PKhmzZp08+ZNMjExSZN9SZLIw8ODZs2aRffv3yciIktLSxo+fDiNHTuW8ufPn2Zf3dzcaMqUKfTs2TMqV64cNWrUiGrUqEHW1tYUHx9Pz549Ix8fH7pw4QKpVCrq0aMHLVy4kAoWLJihvtEWSfV5tWrVqEiRIkRE1KlTJzp06BA5OjrS3r17deztd4auFZNJP9HR0TA3Nxe//A0aNMCHDx/EdGXWrFnptqlSqXDo0CHUrFlT2LWwsMCECRMQGBiY4mdDQkLQpUsXEBF++eUXXLp0KcVF7JCQECxZsgT58uVD/vz54e7unm5/dcn9+/fFKOuff/7RtTvfFSxYCuWXX34RDw0R4eHDh9i7d6+YLt69ezdDdiVJgoeHB2rVqiWEy9zcHOPGjUtyGhcQEIAqVarA2toarq6u6WorICAAHTt2BBFh9erVGfJXV3Tu3BlEhK5du+rale8KFiyFsmbNGhAR8uXLByLC8OHDIUmSeJCqVauWalHQlJAkCUePHoW9vb2GcI0dOxb+/v4AEkZ69vb2KFSoEB49epThdkaOHAkigpubW4b9zWoePHggfjAePnyoa3e+G1iwFMrLly9BRDA0NBRnqyIjIxEQEID8+fODiDBt2rRMtyNJEo4dOwYHBwchXDlz5sTo0aPh4uICExMT3LlzJ9NtdOnSBdbW1vjw4UOmfc4qfvvtNxARunTpomtXvhtYsBSM+jBowYIFNSrBuLq6gohgZGQEb29vWdqSJAmenp6oW7euEC4DAwPMmTNHFvtBQUEoWLAgunfvLou9rODhw4diSp7W3Vkmc7BgKZixY8eCiMR6U926dcV7Xbt2BRGhSpUqiI6Olq1NSZJw8uRJFCxYEAUKFMjUtPNbVq5cCSMjI7x79042m9pG3c+dOnXStSvfBSxYCubcuXMgIuTPnx9GRkYgIty7dw/A/0YsRITJkyfL2m5ERAQsLCwwY8YMWe1+/vwZFhYWmD17tqx2tYmvr68YZan7ntEenA9LwTRq1IgsLS0pODiYmjdvTkRE69evJyKi/Pnz09q1a4mIaP78+XT79m3Z2r179y5FRkZShw4dZLNJRJQrVy5q1aoVXbp0SVa72qRSpUrUrVs3IiKaNWuWjr3Rf1iwFIypqSm1atWKiIiKFStGRES7du2iiIgIIko44Ni9e3eSJIn69u1L0dHRsrTr7e1NZmZmVLlyZVnsfY29vT15e3srqpjD9OnTycDAgA4dOkT37t3TtTt6DQuWwmnbti0RET158oTKlClD4eHhtG/fPvH+ypUrqVChQvT48WOaOXOmLG2+ffuWSpQokebT9OmhTJkyFBoaSpGRkbLb1hYVK1ak7t27ExHJ1sdM0rBgKZyffvqJiIhu3LhBPXv2JKL/TQuJiPLlyyf+f9GiRXTjxo1Mt6lSqcjY2DjTdpJCnQFU7pzr2mbatGlkaGhIHh4edOfOHV27o7ewYCmcYsWKUdWqVQkA2djYkKmpKXl7e2s8NO3bt6fevXuTJEnUr18/+vLlS6bazJMnDwUHB2fW9SQJCQkhY2NjMjc314p9bVGhQgUeZWUBLFh6gHpaePXqVerUqRMRaY6yiIhWrFhBhQsXpidPntC0adMy1V716tUpMDCQ/Pz8MmUnKe7c
}
},
"cell_type": "markdown",
"metadata": {},
"source": [
"![nn.png](attachment:48b1ed6e-8e2b-4883-82ac-a2bbed6e2885.png)"
]
},
{
"attachments": {},
2023-04-03 12:26:38 +02:00
"cell_type": "markdown",
"metadata": {},
"source": [
"In this example the input vector of the neural network has two features, i.e., the input is a two-dimensional vector:\n",
"\n",
"$$\n",
"\\mathbf x = (x_0, x_1).\n",
"$$\n",
"\n",
2023-04-13 16:03:24 +02:00
"We consider a set of $m$ vectors as training data. The training data can therefore be written as a $m \\times 2$ matrix where each row represents a feature vector:\n",
2023-04-03 12:26:38 +02:00
"\n",
"$$ \n",
"X = \n",
"\\begin{pmatrix}\n",
"x_{00} & x_{01} \\\\\n",
"x_{10} & x_{11} \\\\\n",
"\\vdots & \\vdots \\\\\n",
"x_{m-1\\,0} & x_{m-1\\,1} \n",
"\\end{pmatrix} $$\n",
"\n",
2023-04-13 16:03:24 +02:00
"The known labels (1 = 'signal', 0 = 'background') are stored in a $m$-dimensional column vector $\\mathbf y$.\n",
2023-04-03 12:26:38 +02:00
"\n",
"In the following, $n_1$ denotes the number of neurons in the hidden layer. The weights for the connections from the input layer (layer 0) to the hidden layer (layer 0) are given by the following matrix:\n",
"\n",
"$$\n",
"W^{(1)} = \n",
"\\begin{pmatrix}\n",
"w_{00}^{(1)} \\dots w_{0 \\, n_1-1}^{(1)} \\\\\n",
"w_{10}^{(1)} \\dots w_{1 \\, n_1-1}^{(1)} \n",
"\\end{pmatrix}\n",
"$$\n",
"\n",
"Each neuron in the hidden layer is assigned a bias $\\mathbf b^{(1)} = (b^{(1)}_0, \\ldots, b^{(1)}_{n_1-1})$. The neuron in the output layer has the bias $\\mathbf b^{(2)}$. With that, the output values of the network for the matrix $X$ of input feature vectors is given by\n",
"\n",
"$$\n",
"\\begin{align}\n",
"Z^{(1)} &= X W^{(1)} + \\mathbf b^{(1)} \\\\\n",
"A^{(1)} &= \\sigma(Z^{(1)}) \\\\\n",
"Z^{(2)} &= A^{(1)} W^{(2)} + \\mathbf b^{(2)} \\\\\n",
"A^{(2)} &= \\sigma(Z^{(2)})\n",
"\\end{align}\n",
"$$\n",
"\n",
"The loss function for a given set of weights is given by\n",
"\n",
"$$ L = \\sum_{k=0}^{m-1} (y_{\\mathrm{pred},k} - y_{\\mathrm{true},k})^2 $$\n",
2023-04-03 12:26:38 +02:00
"\n",
"We can know calculate the gradient of the loss function w.r.t. the wights. With the definition $\\hat L = (y_\\mathrm{pred} - y_\\mathrm{true})^2$, the gradients for the weights from the output layer to the hidden layer are given by: \n",
"\n",
2023-04-13 16:03:24 +02:00
"$$ \\frac{\\partial \\tilde L}{\\partial w_i^{(2)}} = \\frac{\\partial \\tilde L}{\\partial a_k^{(2)}} \\frac{a_k^{(2)}}{\\partial w_i^{(2)}} = \\frac{\\partial \\tilde L}{ \\partial a_k^{(2)}} \\frac{\\partial a_k^{(2)}}{ \\partial z_k^{(2)}} \\frac{\\partial z_k^{(2)}}{\\partial w_i^{(2)}} = 2 (a_k^{(2)} - y_k) a_k^{(2)} (1 - a_k^{(2)}) a_{k,i}^{(1)} \\equiv \\delta^{(2)}_k a_{k,i}^{(1)}$$\n",
2023-04-03 12:26:38 +02:00
"\n",
2023-04-13 16:03:24 +02:00
"Note, that it is assumed that the activation function is a sigmoid with the derivative\n",
"\n",
"$$ \\sigma(x) \\cdot (1 - \\sigma(x)) $$\n",
2023-04-13 16:03:24 +02:00
"\n",
"Applying the chain rule further, we obtain the gradient for the weights from the input layer to the hidden layer: \n",
2023-04-03 12:26:38 +02:00
"\n",
"$$ \\frac{\\partial \\tilde L}{\\partial w_{ij}^{(1)}} = \\frac{\\partial \\tilde L}{\\partial a_k^{(2)}} \\frac{\\partial a_k^{(2)}}{\\partial z_k^{(2)}} \\frac{\\partial z_k^{(2)}}{\\partial a_{k,j}^{(1)}} \\frac{\\partial a_{k,j}^{(1)}}{\\partial z_{k,j}^{(1)}} \\frac{\\partial z_{k,j}^{(1)}}{\\partial w_{ij}^{(1)}} $$\n",
"\n"
2023-04-03 12:26:38 +02:00
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## A simple neural network class"
]
},
{
"cell_type": "code",
2023-04-13 16:03:24 +02:00
"execution_count": null,
2023-04-03 12:26:38 +02:00
"metadata": {},
"outputs": [],
"source": [
"# A simple feed-forward neutral network with on hidden layer\n",
"# see also https://towardsdatascience.com/how-to-build-your-own-neural-network-from-scratch-in-python-68998a08e4f6\n",
"\n",
"import numpy as np\n",
"\n",
"class NeuralNetwork:\n",
" def __init__(self, x, y):\n",
" n1 = 3 # number of neurons in the hidden layer\n",
" self.input = x\n",
" self.weights1 = np.random.rand(self.input.shape[1],n1)\n",
" self.bias1 = np.random.rand(n1)\n",
" self.weights2 = np.random.rand(n1,1)\n",
" self.bias2 = np.random.rand(1) \n",
" self.y = y\n",
" self.output = np.zeros(y.shape)\n",
" self.learning_rate = 0.01\n",
" self.n_train = 0\n",
" self.loss_history = []\n",
"\n",
" def sigmoid(self, x):\n",
" return 1/(1+np.exp(-x))\n",
"\n",
" def sigmoid_derivative(self, x):\n",
" return x * (1 - x)\n",
"\n",
" def feedforward(self):\n",
" self.layer1 = self.sigmoid(self.input @ self.weights1 + self.bias1)\n",
" self.output = self.sigmoid(self.layer1 @ self.weights2 + self.bias2)\n",
"\n",
" def backprop(self):\n",
"\n",
" # delta2: [m, 1], m = number of training data\n",
" delta2 = 2 * (self.y - self.output) * self.sigmoid_derivative(self.output)\n",
2023-04-03 12:26:38 +02:00
"\n",
" # Gradient w.r.t. weights from hidden to output layer: [n1, 1] matrix, n1 = # neurons in hidden layer\n",
" # self.layer1.T: m x n1 matrix\n",
" d_weights2 = self.layer1.T @ delta2\n",
" d_bias2 = np.sum(delta2)\n",
"\n",
" print(self.layer1.shape) \n",
2023-04-03 12:26:38 +02:00
" \n",
" # shape of delta1: [m, n1], m = number of training data, n1 = # neurons in hidden layer\n",
" delta1 = (delta2 @ self.weights2.T) * self.sigmoid_derivative(self.layer1)\n",
" d_weights1 = self.input.T @ delta1\n",
" d_bias1 = np.ones(delta1.shape[0]) @ delta1 \n",
2023-04-03 12:26:38 +02:00
" \n",
" # update weights and biases\n",
" self.weights1 += self.learning_rate * d_weights1\n",
" self.weights2 += self.learning_rate * d_weights2\n",
"\n",
" self.bias1 += self.learning_rate * d_bias1\n",
" self.bias2 += self.learning_rate * d_bias2\n",
"\n",
" def train(self, X, y):\n",
" self.output = np.zeros(y.shape)\n",
" self.input = X\n",
" self.y = y\n",
" self.feedforward()\n",
" self.backprop()\n",
" self.n_train += 1\n",
" if (self.n_train %1000 == 0):\n",
" loss = np.sum((self.y - self.output)**2)\n",
" print(\"loss: \", loss)\n",
" self.loss_history.append(loss)\n",
" \n",
" def predict(self, X):\n",
" self.output = np.zeros(y.shape)\n",
" self.input = X\n",
" self.feedforward()\n",
" return self.output\n",
" \n",
" def loss_history(self):\n",
" return self.loss_history\n"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Create toy data\n",
"We create three toy data sets\n",
"1. two moon-like distributions\n",
"2. circles\n",
"3. linearly separable data sets"
]
},
{
"cell_type": "code",
2023-04-13 16:03:24 +02:00
"execution_count": null,
2023-04-03 12:26:38 +02:00
"metadata": {},
"outputs": [],
"source": [
"# https://scikit-learn.org/stable/auto_examples/classification/plot_classifier_comparison.html#sphx-glr-auto-examples-classification-plot-classifier-comparison-py\n",
"import numpy as np\n",
"from sklearn.datasets import make_moons, make_circles, make_classification\n",
"from sklearn.model_selection import train_test_split\n",
"\n",
"X, y = make_classification(\n",
" n_features=2, n_redundant=0, n_informative=2, random_state=1, n_clusters_per_class=1\n",
")\n",
"rng = np.random.RandomState(2)\n",
"X += 2 * rng.uniform(size=X.shape)\n",
"linearly_separable = (X, y)\n",
"\n",
"datasets = [\n",
" make_moons(n_samples=200, noise=0.1, random_state=0),\n",
" make_circles(n_samples=200, noise=0.1, factor=0.5, random_state=1),\n",
" linearly_separable,\n",
"]"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Create training and test data set"
]
},
{
"cell_type": "code",
2023-04-13 16:03:24 +02:00
"execution_count": null,
2023-04-03 12:26:38 +02:00
"metadata": {},
"outputs": [],
"source": [
"# datasets: 0 = moons, 1 = circles, 2 = linearly separable\n",
"X, y = datasets[1]\n",
2023-04-03 12:26:38 +02:00
"X_train, X_test, y_train, y_test = train_test_split(\n",
" X, y, test_size=0.4, random_state=42\n",
")\n",
"\n",
"x_min, x_max = X[:, 0].min() - 0.5, X[:, 0].max() + 0.5\n",
"y_min, y_max = X[:, 1].min() - 0.5, X[:, 1].max() + 0.5\n"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Train the model"
]
},
{
"cell_type": "code",
2023-04-13 16:03:24 +02:00
"execution_count": null,
2023-04-03 12:26:38 +02:00
"metadata": {},
2023-04-13 16:03:24 +02:00
"outputs": [],
2023-04-03 12:26:38 +02:00
"source": [
"y_train = y_train.reshape(-1, 1)\n",
"\n",
"nn = NeuralNetwork(X_train, y_train)\n",
"\n",
"for i in range(100000):\n",
" nn.train(X_train, y_train)\n"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Plot the loss vs. the number of epochs"
]
},
{
"cell_type": "code",
2023-04-13 16:03:24 +02:00
"execution_count": null,
2023-04-03 12:26:38 +02:00
"metadata": {},
2023-04-13 16:03:24 +02:00
"outputs": [],
2023-04-03 12:26:38 +02:00
"source": [
"import matplotlib.pyplot as plt\n",
"plt.plot(nn.loss_history)\n",
"plt.xlabel(\"# epochs / 1000\")\n",
"plt.ylabel(\"loss\")"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": []
},
{
"cell_type": "code",
2023-04-13 16:03:24 +02:00
"execution_count": null,
2023-04-03 12:26:38 +02:00
"metadata": {},
2023-04-13 16:03:24 +02:00
"outputs": [],
2023-04-03 12:26:38 +02:00
"source": [
"import matplotlib.pyplot as plt\n",
"from matplotlib.colors import ListedColormap\n",
"\n",
"cm = plt.cm.RdBu\n",
"cm_bright = ListedColormap([\"#FF0000\", \"#0000FF\"])\n",
"\n",
"xv = np.linspace(x_min, x_max, 10)\n",
"yv = np.linspace(y_min, y_max, 10)\n",
"Xv, Yv = np.meshgrid(xv, yv)\n",
"XYpairs = np.vstack([ Xv.reshape(-1), Yv.reshape(-1)])\n",
"zv = nn.predict(XYpairs.T)\n",
"Zv = zv.reshape(Xv.shape)\n",
"\n",
"fig, ax = plt.subplots(nrows=1, ncols=1, figsize=(9, 7))\n",
"ax.set_aspect(1)\n",
"cn = ax.contourf(Xv, Yv, Zv, cmap=\"coolwarm_r\", alpha=0.4)\n",
"\n",
"ax.scatter(X_train[:, 0], X_train[:, 1], c=y_train, cmap=cm_bright, edgecolors=\"k\")\n",
"\n",
"# Plot the testing points\n",
"ax.scatter(X_test[:, 0], X_test[:, 1], c=y_test, cmap=cm_bright, alpha=0.4, edgecolors=\"k\")\n",
"\n",
"ax.set_xlim(x_min, x_max)\n",
"ax.set_ylim(y_min, y_max)\n",
"# ax.set_xticks(())\n",
"# ax.set_yticks(())\n",
"\n",
"fig.colorbar(cn)\n"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": []
}
],
"metadata": {
"kernelspec": {
"display_name": "Python 3 (ipykernel)",
"language": "python",
"name": "python3"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 3
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
2023-04-13 16:06:48 +02:00
"version": "3.8.16"
2023-04-03 12:26:38 +02:00
},
"vscode": {
"interpreter": {
"hash": "b0fa6594d8f4cbf19f97940f81e996739fb7646882a419484c72d19e05852a7e"
}
}
},
"nbformat": 4,
"nbformat_minor": 4
}