This repository contains code and resources for recreating LLama3 and GPT-2 models, inspired by the work of Andrej Karpathy. Specifically, we are referring to his repositories and code provided in the Neural Networks: Zero To Hero video lecture series.
The purpose of this repository is to provide a hands-on implementation of LLama3 and GPT-2, following the methodologies and coding practices demonstrated by Andrej Karpathy. The project aims to recreate the models while allowing for easy hacking, exploration, and learning.
- nanoGPT-Lecture
- Code created in the Neural Networks: Zero To Hero video lecture series, specifically on the first lecture on nanoGPT.
- GitHub Repo: nanoGPT model.py
- Note: Model initialization is crucial for good performance. The current code will train and work fine, but its convergence is slower due to starting off in a suboptimal weight space. Future updates may cover these parts in more detail.
This project is licensed under the MIT License.
The repository will be updated periodically as new insights and improvements are made available by the original author, or as time permits.
Feel free to explore, contribute, and learn from this project