Finalize backprop lesson

VikParuchuri · Feb 6, 2023 · bb8b625 · bb8b625
1 parent 1fc5d04
commit bb8b625
Show file tree

Hide file tree

Showing 2 changed files with 37 additions and 18 deletions.
diff --git a/README.md b/README.md
@@ -45,10 +45,9 @@ Recurrent neural networks are optimized to process sequences of data.  They're u
 
 ## 5. Backpropagation in depth
 
-So far, we've taken a somewhat loose look at backpropagation to let us focus on understanding neural network architecture.  We'll build a computational graph, and use it to take a deeper look at how backpropagation works.
+So far, we've taken a loose look at backpropagation to let us focus on understanding neural network architecture.  We'll build a miniature version of PyTorch, and use it to understand backpropagation better.
 
-- Lesson: Read the in-depth backpropagation tutorial (coming soon)
-- Implementation: [Notebook](notebooks/comp_graph/comp_graph.ipynb)
+- Lesson: Read the [in-depth backpropagation tutorial](explanations/comp_graph.ipynb)
 
 ## 6. PyTorch
 

diff --git a/explanations/comp_graph.ipynb b/explanations/comp_graph.ipynb
@@ -7,13 +7,13 @@
     "\n",
     "In the [last lesson](https://github.com/VikParuchuri/zero_to_gpt/blob/master/explanations/rnn.ipynb), we learned how to create a recurrent neural network.  We now know how to build several network architectures using components like dense layers, softmax, and recurrent layers.\n",
     "\n",
-    "We've been a bit loose with how we cover backpropagation, so that neural network architecture is easier to understand.  Backpropagation is how a neural network calculates how much to change each parameter in the network (the gradient).\n",
+    "We've been a bit loose with how we cover backpropagation, so that neural network architecture is easier to understand.  Backpropagation is how a neural network calculates how much to change each parameter in the network (the gradient).  Understanding how it works is important for tuning networks for performance, and writing fused kernels for GPUs.\n",
     "\n",
     "In this lesson, we'll do a deep dive into how backpropagation works.  We'll do this by building a computational graph to keep track of which changes we make to input data.\n",
     "\n",
     "A computational graph looks like this:\n",
     "\n",
-    "![](comp_graph.png)\n",
+    "![](images/comp_graph/comp_graph.png)\n",
     "\n",
     "It shows all the individual operations we performed (like multiplication) to modify the value of `X`, in order.  Keeping track of a computational graph is how we know how to reverse our operations to do backpropagation.\n",
     "\n",
@@ -206,7 +206,7 @@
    "source": [
     "Now we can build the forward pass of our staged softmax.  The derivative of multiplication is easier to calculate than division, so we'll swap some of our operations to remove the division.\n",
     "\n",
-    "Luckily for us, raising a value `x` to the power `-1` is the same as taking `1/x`.  So instead of dividing `Exp/Sum`, we can do `Exp * Sum ^ -1$, leaving us with these operations:\n",
+    "Luckily for us, raising a value `x` to the power `-1` is the same as taking `1/x`.  So instead of dividing `Exp/Sum`, we can do `Exp * Sum ^ -1`, leaving us with these operations:\n",
     "\n",
     "- Exp\n",
     "- Sum\n",
@@ -1181,49 +1181,65 @@
    }
   },
   {
-   "cell_type": "code",
-   "execution_count": null,
-   "outputs": [],
-   "source": [],
+   "cell_type": "markdown",
+   "source": [
+    "We just built a computational graph, and used it to do the full forward and backward pass for a neural network!  If you want, you can extend this to update the parameters and train the network.  You would just need to set a learning rate, then subtract the gradient from each parameter.  You would have to set a batch size, and iterate through the data as well.\n",
+    "\n",
+    "This has hopefully given you a good look at how backpropagation, works, and how we compute the partial derivatives of each operation, then multiply them out.\n",
+    "\n",
+    "Let's do a quick verification to make sure that we did everything correctly.  We can implement the network forward and backward pass like we did in an earlier lesson:"
+   ],
    "metadata": {
     "collapsed": false
    }
   },
   {
    "cell_type": "code",
-   "execution_count": 119,
+   "execution_count": 124,
    "outputs": [],
    "source": [
+    "# Forward pass\n",
     "l1 = train_x @ w1 + b1\n",
     "l1_activated = np.maximum(l1, 0)\n",
     "l2 = l1_activated @ w2 + b2\n",
     "probs = softmax_func(l2)\n",
     "\n",
+    "# Loss\n",
     "loss_grad = nll_grad(train_y, probs)\n",
     "\n",
+    "# L2 gradients\n",
     "sm_grad = softmax_grad_func(probs, loss_grad)\n",
     "l2_w_grad = l1_activated.T @ sm_grad\n",
     "l2_b_grad = sm_grad.sum(axis=0)\n",
     "\n",
+    "# L1 gradients\n",
     "l1_grad = sm_grad @ w2.T\n",
     "l1_grad[l1 < 0] = 0\n",
-    "\n",
     "l1_w_grad = train_x.T @ l1_grad\n",
     "l1_b_grad = l1_grad.sum(axis=0)"
    ],
    "metadata": {
     "collapsed": false
    }
   },
+  {
+   "cell_type": "markdown",
+   "source": [
+    "Then we can verify that our computational graph matches the manual results:"
+   ],
+   "metadata": {
+    "collapsed": false
+   }
+  },
   {
    "cell_type": "code",
-   "execution_count": 120,
+   "execution_count": 125,
    "outputs": [
     {
      "data": {
       "text/plain": "True"
      },
-     "execution_count": 120,
+     "execution_count": 125,
      "metadata": {},
      "output_type": "execute_result"
     }
@@ -1236,10 +1252,14 @@
    }
   },
   {
-   "cell_type": "code",
-   "execution_count": 120,
-   "outputs": [],
-   "source": [],
+   "cell_type": "markdown",
+   "source": [
+    "# Wrap-up\n",
+    "\n",
+    "We did a lot in this lesson!  We learned how to break apart a derivative into steps, then compute each step separately.  Then, we constructed a computational graph and ran the forward and backward passes.\n",
+    "\n",
+    "I recommend doing some experimentation with the graph, and making sure you really understand how everything is working.  In the next lesson, we'll use PyTorch to automatically construct the graph for us."
+   ],
    "metadata": {
     "collapsed": false
    }