Port of Karpathy's Let's Build GPT tutorial to Candle #1525
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
This adds a link to a port of this tutorial to Candle. I moved all comments from his notebook to this code as well.
You will also see a struggle between following his variable naming and my own standards ;) I will probably make that more consistent soon.
It is meant as a complement to the other tutorial link and it showcases different sides of the Candle API applied to a toy example, especially for people trying to build a model from scratch (as opposed to loading pre-trained weights).
I'm sure not everything in the port might be idiomatic use of Candle, happy to receive feedback! Just know that it works and I had to dive into the API quite often to find the corresponding functionality. (Only thing I couldn't find/work around was on-device multinomial sampling)