This course will get you from no knowledge of deep learning to training a GPT model. As AI moves out of the research lab, the world needs more people who can understand and apply it. If you want to be one of them, this course is for you.
We’ll start with the fundamentals - the basics of neural networks, how they work, and how to tune them. You need some math to understand deep learning, but we won’t get bogged down in it.
This course focuses on understanding concepts over theory. We’ll solve real problems, like predicting the weather and translating languages. Over time, we'll move to more complex topics, like transformers, GPU programming, and distributed training.
To use this course, go through each chapter sequentially. Read the lessons or watch the optional videos. Look through the implementations to solidify your understanding, and try to recreate them on your own.
An overview of the course and topics we'll cover. Includes some math and NumPy fundamentals you'll need for deep learning.
- Lesson: Read the intro
Gradient descent is how neural networks train their parameters to match the data. It's the "learning" part of deep learning.
- Lesson: Read the gradient descent tutorial and watch the optional video
- Implementation: Notebook and class
Dense networks are the basic form of a neural network, where every input is connected to an output. These can also be called fully connected networks.
- Lesson: Read the dense network tutorial and watch the optional video
- Implementation: Notebook and class
Classification is how we get neural networks to categorize data for us. Classification is used by language models like GPT to predict the next word in a sequence.
- Lesson: Read the classification tutorial
Recurrent neural networks are optimized to process sequences of data. They're used for tasks like translation and text classification.
- Lesson: Read the recurrent network tutorial
- Implementation: Notebook
So far, we've taken a somewhat loose look at backpropagation to let us focus on understanding neural network architecture. We'll build a computational graph, and use it to take a deeper look at how backpropagation works.
- Lesson: Read the in-depth backpropagation tutorial (coming soon)
- Implementation: Notebook
PyTorch is a framework for deep learning that automatically differentiates functions. It's widely used to create cutting-edge models.
- Lesson: Read the PyTorch tutorial (coming soon)
Regularization prevents overfitting to the training set. This means that the network can generalize well to new data.
- Lesson: Read the regularization tutorial (coming soon)
If you want to train a deep learning model, you need data. Gigabytes of it. We'll discuss how you can get this data and process it.
- Lesson: Read the data tutorial (coming soon)
- Implementation: Notebook coming soon
Encoder/decoders are used for NLP tasks when the output isn't the same length as the input. For example, if you want to use questions/answers as training data, the answers may be a different length than the question.
- Lesson: Read the encoder/decoder tutorial (coming soon)
- Implementation: Notebook
Transformers fix the problem of vanishing/exploding gradients in RNNs by using attention. Attention allows the network to process the whole sequence at once, instead of iteratively.
- Lesson: Read the transformer tutorial (coming soon)
- Implementation: Notebook
To train a large neural network, we'll need to use GPUs. PyTorch can automatically use GPUs, but not all operators are fused and optimized. For example, flash attention can speed up transformers by 2x or more. We'll use OpenAI Triton to implement GPU kernels.
- Lesson: Read the GPU programming tutorial (coming soon)
- Implementation: Notebook coming soon
GPT models take a long time to train. We can reduce that time by using more GPUs, but we don't all have access to GPU clusters. To reduce training time, we'll incorporate some recent advances to make the transformer model more efficient.
- Lesson: Read the efficient transformer tutorial (coming soon)
- Implementation: Notebook
Convolutional neural networks are used for working with images and time series.
Gated recurrent networks help RNNs process long sequences by helping networks forget irrelevant information. LSTM and GRU are two popular types of gated networks.
- Lesson: Read the GRU tutorial (coming soon)
- Implementation: Notebook
If you want to run these notebooks locally, you'll need to install some Python packages.
- Make sure you have Python 3.8 or higher installed.
- Clone this repository.
- Run
pip install -r requirements.txt
You can use and adapt this material for your own courses, but not commercially. You must provide attribution to Vik Paruchuri, Dataquest
if you use this material.
This work is licensed under a Creative Commons Attribution-NonCommercial 4.0 International License.