Skip to content

Commit

Permalink
Finalize backprop lesson
Browse files Browse the repository at this point in the history
  • Loading branch information
VikParuchuri committed Feb 6, 2023
1 parent 1fc5d04 commit bb8b625
Show file tree
Hide file tree
Showing 2 changed files with 37 additions and 18 deletions.
5 changes: 2 additions & 3 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -45,10 +45,9 @@ Recurrent neural networks are optimized to process sequences of data. They're u

## 5. Backpropagation in depth

So far, we've taken a somewhat loose look at backpropagation to let us focus on understanding neural network architecture. We'll build a computational graph, and use it to take a deeper look at how backpropagation works.
So far, we've taken a loose look at backpropagation to let us focus on understanding neural network architecture. We'll build a miniature version of PyTorch, and use it to understand backpropagation better.

- Lesson: Read the in-depth backpropagation tutorial (coming soon)
- Implementation: [Notebook](notebooks/comp_graph/comp_graph.ipynb)
- Lesson: Read the [in-depth backpropagation tutorial](explanations/comp_graph.ipynb)

## 6. PyTorch

Expand Down
50 changes: 35 additions & 15 deletions explanations/comp_graph.ipynb
Original file line number Diff line number Diff line change
Expand Up @@ -7,13 +7,13 @@
"\n",
"In the [last lesson](https://github.com/VikParuchuri/zero_to_gpt/blob/master/explanations/rnn.ipynb), we learned how to create a recurrent neural network. We now know how to build several network architectures using components like dense layers, softmax, and recurrent layers.\n",
"\n",
"We've been a bit loose with how we cover backpropagation, so that neural network architecture is easier to understand. Backpropagation is how a neural network calculates how much to change each parameter in the network (the gradient).\n",
"We've been a bit loose with how we cover backpropagation, so that neural network architecture is easier to understand. Backpropagation is how a neural network calculates how much to change each parameter in the network (the gradient). Understanding how it works is important for tuning networks for performance, and writing fused kernels for GPUs.\n",
"\n",
"In this lesson, we'll do a deep dive into how backpropagation works. We'll do this by building a computational graph to keep track of which changes we make to input data.\n",
"\n",
"A computational graph looks like this:\n",
"\n",
"![](comp_graph.png)\n",
"![](images/comp_graph/comp_graph.png)\n",
"\n",
"It shows all the individual operations we performed (like multiplication) to modify the value of `X`, in order. Keeping track of a computational graph is how we know how to reverse our operations to do backpropagation.\n",
"\n",
Expand Down Expand Up @@ -206,7 +206,7 @@
"source": [
"Now we can build the forward pass of our staged softmax. The derivative of multiplication is easier to calculate than division, so we'll swap some of our operations to remove the division.\n",
"\n",
"Luckily for us, raising a value `x` to the power `-1` is the same as taking `1/x`. So instead of dividing `Exp/Sum`, we can do `Exp * Sum ^ -1$, leaving us with these operations:\n",
"Luckily for us, raising a value `x` to the power `-1` is the same as taking `1/x`. So instead of dividing `Exp/Sum`, we can do `Exp * Sum ^ -1`, leaving us with these operations:\n",
"\n",
"- Exp\n",
"- Sum\n",
Expand Down Expand Up @@ -1181,49 +1181,65 @@
}
},
{
"cell_type": "code",
"execution_count": null,
"outputs": [],
"source": [],
"cell_type": "markdown",
"source": [
"We just built a computational graph, and used it to do the full forward and backward pass for a neural network! If you want, you can extend this to update the parameters and train the network. You would just need to set a learning rate, then subtract the gradient from each parameter. You would have to set a batch size, and iterate through the data as well.\n",
"\n",
"This has hopefully given you a good look at how backpropagation, works, and how we compute the partial derivatives of each operation, then multiply them out.\n",
"\n",
"Let's do a quick verification to make sure that we did everything correctly. We can implement the network forward and backward pass like we did in an earlier lesson:"
],
"metadata": {
"collapsed": false
}
},
{
"cell_type": "code",
"execution_count": 119,
"execution_count": 124,
"outputs": [],
"source": [
"# Forward pass\n",
"l1 = train_x @ w1 + b1\n",
"l1_activated = np.maximum(l1, 0)\n",
"l2 = l1_activated @ w2 + b2\n",
"probs = softmax_func(l2)\n",
"\n",
"# Loss\n",
"loss_grad = nll_grad(train_y, probs)\n",
"\n",
"# L2 gradients\n",
"sm_grad = softmax_grad_func(probs, loss_grad)\n",
"l2_w_grad = l1_activated.T @ sm_grad\n",
"l2_b_grad = sm_grad.sum(axis=0)\n",
"\n",
"# L1 gradients\n",
"l1_grad = sm_grad @ w2.T\n",
"l1_grad[l1 < 0] = 0\n",
"\n",
"l1_w_grad = train_x.T @ l1_grad\n",
"l1_b_grad = l1_grad.sum(axis=0)"
],
"metadata": {
"collapsed": false
}
},
{
"cell_type": "markdown",
"source": [
"Then we can verify that our computational graph matches the manual results:"
],
"metadata": {
"collapsed": false
}
},
{
"cell_type": "code",
"execution_count": 120,
"execution_count": 125,
"outputs": [
{
"data": {
"text/plain": "True"
},
"execution_count": 120,
"execution_count": 125,
"metadata": {},
"output_type": "execute_result"
}
Expand All @@ -1236,10 +1252,14 @@
}
},
{
"cell_type": "code",
"execution_count": 120,
"outputs": [],
"source": [],
"cell_type": "markdown",
"source": [
"# Wrap-up\n",
"\n",
"We did a lot in this lesson! We learned how to break apart a derivative into steps, then compute each step separately. Then, we constructed a computational graph and ran the forward and backward passes.\n",
"\n",
"I recommend doing some experimentation with the graph, and making sure you really understand how everything is working. In the next lesson, we'll use PyTorch to automatically construct the graph for us."
],
"metadata": {
"collapsed": false
}
Expand Down

0 comments on commit bb8b625

Please sign in to comment.