-
Notifications
You must be signed in to change notification settings - Fork 0
/
__init__.py
189 lines (145 loc) · 9.77 KB
/
__init__.py
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
"""
# [labml.ai Annotated PyTorch Paper Implementations](index.html)
This is a collection of simple PyTorch implementations of
neural networks and related algorithms.
[These implementations](https://github.com/labmlai/annotated_deep_learning_paper_implementations) are documented with explanations,
and the [website](index.html)
renders these as side-by-side formatted notes.
We believe these would help you understand these algorithms better.
![Screenshot](dqn-light.png)
We are actively maintaining this repo and adding new
implementations.
[![Twitter](https://img.shields.io/twitter/follow/labmlai?style=social)](https://twitter.com/labmlai) for updates.
## Translations
### **[English (original)](https://nn.labml.ai)**
### **[Chinese (translated)](https://nn.labml.ai/zh/)**
### **[Japanese (translated)](https://nn.labml.ai/ja/)**
## Paper Implementations
#### ✨ [Transformers](transformers/index.html)
* [Multi-headed attention](transformers/mha.html)
* [Transformer building blocks](transformers/models.html)
* [Transformer XL](transformers/xl/index.html)
* [Relative multi-headed attention](transformers/xl/relative_mha.html)
* [Rotary Positional Embeddings (RoPE)](transformers/rope/index.html)
* [Attention with Linear Biases (ALiBi)](transformers/alibi/index.html)
* [RETRO](transformers/retro/index.html)
* [Compressive Transformer](transformers/compressive/index.html)
* [GPT Architecture](transformers/gpt/index.html)
* [GLU Variants](transformers/glu_variants/simple.html)
* [kNN-LM: Generalization through Memorization](transformers/knn/index.html)
* [Feedback Transformer](transformers/feedback/index.html)
* [Switch Transformer](transformers/switch/index.html)
* [Fast Weights Transformer](transformers/fast_weights/index.html)
* [FNet](transformers/fnet/index.html)
* [Attention Free Transformer](transformers/aft/index.html)
* [Masked Language Model](transformers/mlm/index.html)
* [MLP-Mixer: An all-MLP Architecture for Vision](transformers/mlp_mixer/index.html)
* [Pay Attention to MLPs (gMLP)](transformers/gmlp/index.html)
* [Vision Transformer (ViT)](transformers/vit/index.html)
* [Primer EZ](transformers/primer_ez/index.html)
* [Hourglass](transformers/hour_glass/index.html)
#### ✨ [Eleuther GPT-NeoX](neox/index.html)
* [Generate on a 48GB GPU](neox/samples/generate.html)
* [Finetune on two 48GB GPUs](neox/samples/finetune.html)
* [LLM.int8()](neox/utils/llm_int8.html)
#### ✨ [Diffusion models](diffusion/index.html)
* [Denoising Diffusion Probabilistic Models (DDPM)](diffusion/ddpm/index.html)
* [Denoising Diffusion Implicit Models (DDIM)](diffusion/stable_diffusion/sampler/ddim.html)
* [Latent Diffusion Models](diffusion/stable_diffusion/latent_diffusion.html)
* [Stable Diffusion](diffusion/stable_diffusion/index.html)
#### ✨ [Generative Adversarial Networks](gan/index.html)
* [Original GAN](gan/original/index.html)
* [GAN with deep convolutional network](gan/dcgan/index.html)
* [Cycle GAN](gan/cycle_gan/index.html)
* [Wasserstein GAN](gan/wasserstein/index.html)
* [Wasserstein GAN with Gradient Penalty](gan/wasserstein/gradient_penalty/index.html)
* [StyleGAN 2](gan/stylegan/index.html)
#### ✨ [Recurrent Highway Networks](recurrent_highway_networks/index.html)
#### ✨ [LSTM](lstm/index.html)
#### ✨ [HyperNetworks - HyperLSTM](hypernetworks/hyper_lstm.html)
#### ✨ [ResNet](resnet/index.html)
#### ✨ [ConvMixer](conv_mixer/index.html)
#### ✨ [Capsule Networks](capsule_networks/index.html)
#### ✨ [U-Net](unet/index.html)
#### ✨ [Sketch RNN](sketch_rnn/index.html)
#### ✨ Graph Neural Networks
* [Graph Attention Networks (GAT)](graphs/gat/index.html)
* [Graph Attention Networks v2 (GATv2)](graphs/gatv2/index.html)
#### ✨ [Reinforcement Learning](rl/index.html)
* [Proximal Policy Optimization](rl/ppo/index.html) with
[Generalized Advantage Estimation](rl/ppo/gae.html)
* [Deep Q Networks](rl/dqn/index.html) with
with [Dueling Network](rl/dqn/model.html),
[Prioritized Replay](rl/dqn/replay_buffer.html)
and Double Q Network.
#### ✨ [Counterfactual Regret Minimization (CFR)](cfr/index.html)
Solving games with incomplete information such as poker with CFR.
* [Kuhn Poker](cfr/kuhn/index.html)
#### ✨ [Optimizers](optimizers/index.html)
* [Adam](optimizers/adam.html)
* [AMSGrad](optimizers/amsgrad.html)
* [Adam Optimizer with warmup](optimizers/adam_warmup.html)
* [Noam Optimizer](optimizers/noam.html)
* [Rectified Adam Optimizer](optimizers/radam.html)
* [AdaBelief Optimizer](optimizers/ada_belief.html)
* [Sophia-G Optimizer](optimizers/sophia.html)
#### ✨ [Normalization Layers](normalization/index.html)
* [Batch Normalization](normalization/batch_norm/index.html)
* [Layer Normalization](normalization/layer_norm/index.html)
* [Instance Normalization](normalization/instance_norm/index.html)
* [Group Normalization](normalization/group_norm/index.html)
* [Weight Standardization](normalization/weight_standardization/index.html)
* [Batch-Channel Normalization](normalization/batch_channel_norm/index.html)
* [DeepNorm](normalization/deep_norm/index.html)
#### ✨ [Distillation](distillation/index.html)
#### ✨ [Adaptive Computation](adaptive_computation/index.html)
* [PonderNet](adaptive_computation/ponder_net/index.html)
#### ✨ [Uncertainty](uncertainty/index.html)
* [Evidential Deep Learning to Quantify Classification Uncertainty](uncertainty/evidence/index.html)
#### ✨ [Activations](activations/index.html)
* [Fuzzy Tiling Activations](activations/fta/index.html)
#### ✨ [Language Model Sampling Techniques](sampling/index.html)
* [Greedy Sampling](sampling/greedy.html)
* [Temperature Sampling](sampling/temperature.html)
* [Top-k Sampling](sampling/top_k.html)
* [Nucleus Sampling](sampling/nucleus.html)
#### ✨ [Scalable Training/Inference](scaling/index.html)
* [Zero3 memory optimizations](scaling/zero3/index.html)
## Highlighted Research Paper PDFs
* [Autoregressive Search Engines: Generating Substrings as Document Identifiers](https://github.com/labmlai/annotated_deep_learning_paper_implementations/blob/master/papers/2204.10628.pdf)
* [Training Compute-Optimal Large Language Models](https://github.com/labmlai/annotated_deep_learning_paper_implementations/blob/master/papers/2203.15556.pdf)
* [ZeRO: Memory Optimizations Toward Training Trillion Parameter Models](https://github.com/labmlai/annotated_deep_learning_paper_implementations/blob/master/papers/1910.02054.pdf)
* [PaLM: Scaling Language Modeling with Pathways](https://github.com/labmlai/annotated_deep_learning_paper_implementations/blob/master/papers/2204.02311.pdf)
* [Hierarchical Text-Conditional Image Generation with CLIP Latents](https://github.com/labmlai/annotated_deep_learning_paper_implementations/blob/master/papers/dall-e-2.pdf)
* [STaR: Self-Taught Reasoner Bootstrapping Reasoning With Reasoning](https://github.com/labmlai/annotated_deep_learning_paper_implementations/blob/master/papers/2203.14465.pdf)
* [Improving language models by retrieving from trillions of tokens](https://github.com/labmlai/annotated_deep_learning_paper_implementations/blob/master/papers/2112.04426.pdf)
* [NeRF: Representing Scenes as Neural Radiance Fields for View Synthesis](https://github.com/labmlai/annotated_deep_learning_paper_implementations/blob/master/papers/2003.08934.pdf)
* [Attention Is All You Need](https://github.com/labmlai/annotated_deep_learning_paper_implementations/blob/master/papers/1706.03762.pdf)
* [Denoising Diffusion Probabilistic Models](https://github.com/labmlai/annotated_deep_learning_paper_implementations/blob/master/papers/2006.11239.pdf)
* [Primer: Searching for Efficient Transformers for Language Modeling](https://github.com/labmlai/annotated_deep_learning_paper_implementations/blob/master/papers/2109.08668.pdf)
* [On First-Order Meta-Learning Algorithms](https://github.com/labmlai/annotated_deep_learning_paper_implementations/blob/master/papers/1803.02999.pdf)
* [Learning Transferable Visual Models From Natural Language Supervision](https://github.com/labmlai/annotated_deep_learning_paper_implementations/blob/master/papers/2103.00020.pdf)
* [The Sensory Neuron as a Transformer: Permutation-Invariant Neural Networks for Reinforcement Learning](https://github.com/labmlai/annotated_deep_learning_paper_implementations/blob/master/papers/2109.02869.pdf)
* [Meta-Gradient Reinforcement Learning](https://github.com/labmlai/annotated_deep_learning_paper_implementations/blob/master/papers/1805.09801.pdf)
* [ETA Prediction with Graph Neural Networks in Google Maps](https://github.com/labmlai/annotated_deep_learning_paper_implementations/blob/master/papers/google_maps_eta.pdf)
* [PonderNet: Learning to Ponder](https://github.com/labmlai/annotated_deep_learning_paper_implementations/blob/master/papers/ponder_net.pdf)
* [Mastering Atari, Go, Chess and Shogi by Planning with a Learned Model](https://github.com/labmlai/annotated_deep_learning_paper_implementations/blob/master/papers/muzero.pdf)
* [GANs N’ Roses: Stable, Controllable, Diverse Image to Image Translation (works for videos too!)](https://github.com/labmlai/annotated_deep_learning_paper_implementations/blob/master/papers/gans_n_roses.pdf)
* [An Image is Worth 16X16 Word: Transformers for Image Recognition at Scale](https://github.com/labmlai/annotated_deep_learning_paper_implementations/blob/master/papers/vit.pdf)
* [Deep Residual Learning for Image Recognition](https://github.com/labmlai/annotated_deep_learning_paper_implementations/blob/master/papers/resnet.pdf)
* [Distilling the Knowledge in a Neural Network](https://github.com/labmlai/annotated_deep_learning_paper_implementations/blob/master/papers/distillation.pdf)
### Installation
```bash
pip install labml-nn
```
### Citing LabML
If you use this for academic research, please cite it using the following BibTeX entry.
```bibtex
@misc{labml,
author = {Varuna Jayasiri, Nipun Wijerathne},
title = {labml.ai Annotated Paper Implementations},
year = {2020},
url = {},
}
```
"""