diff --git a/README.md b/README.md index 26bfe806..3e6b60ca 100644 --- a/README.md +++ b/README.md @@ -23,6 +23,11 @@ We implement a bigram character-level language model, which we will further comp - [Jupyter notebook files](lectures/makemore/makemore_part1_bigrams.ipynb) - [makemore Github repo](https://github.com/karpathy/makemore) +**Supplementary links** + +* [Python + Numpy tutorial from CS231n](https://cs231n.github.io/python-numpy-tutorial/) +* [PyTorch tutorial on Tensors](https://pytorch.org/tutorials/beginner/basics/tensorqs_tutorial.html) +* [Introduction to PyTorch](https://pytorch.org/tutorials/beginner/nlp/pytorch_tutorial.html) --- **Lecture 3: Building makemore Part 2: MLP** @@ -33,6 +38,11 @@ We implement a multilayer perceptron (MLP) character-level language model. In th - [Jupyter notebook files](lectures/makemore/makemore_part2_mlp.ipynb) - [makemore Github repo](https://github.com/karpathy/makemore) +**Supplementary links** + +* [A Neural Probabilistic Language Model](https://www.jmlr.org/papers/volume3/bengio03a/bengio03a.pdf) +* A blog post about [PyTorch Internals](http://blog.ezyang.com/2019/05/pytorch-internals/) + --- **Lecture 4: Building makemore Part 3: Activations & Gradients, BatchNorm** @@ -43,6 +53,13 @@ We dive into some of the internals of MLPs with multiple layers and scrutinize t - [Jupyter notebook files](lectures/makemore/makemore_part3_bn.ipynb) - [makemore Github repo](https://github.com/karpathy/makemore) +**Supplementary links** + +* [Delving Deep into Rectifiers: Surpassing Human-Level Performance on ImageNet Classification](https://arxiv.org/abs/1502.01852) +* [Batch Normalization: Accelerating Deep Network Training by Reducing Internal Covariate Shift](https://arxiv.org/abs/1502.03167) +* [A Neural Probabilistic Language Model](https://www.jmlr.org/papers/volume3/bengio03a/bengio03a.pdf) +* [Rethinking "Batch" in BatchNorm](https://arxiv.org/abs/2105.07576) + --- **Lecture 5: Building makemore Part 4: Becoming a Backprop Ninja** @@ -55,6 +72,13 @@ I recommend you work through the exercise yourself but work with it in tandem an - [Jupyter notebook files](lectures/makemore/makemore_part4_backprop.ipynb) - [makemore Github repo](https://github.com/karpathy/makemore) +**Supplementary links** + +* [Yes you should understand backprop](https://karpathy.medium.com/yes-you-should-understand-backprop-e2f06eab496b) (blog post) +* [Batch Normalization: Accelerating Deep Network Training by Reducing Internal Covariate Shift](https://arxiv.org/abs/1502.03167) +* [Bessel's Correction](https://mathcenter.oxford.emory.edu/site/math117/besselCorrection/) +* [A Neural Probabilistic Language Model](https://www.jmlr.org/papers/volume3/bengio03a/bengio03a.pdf) + --- **Lecture 6: Building makemore Part 5: Building WaveNet** @@ -64,6 +88,11 @@ We take the 2-layer MLP from previous video and make it deeper with a tree-like - [YouTube video lecture](https://youtu.be/t3YJ5hKiMQ0) - [Jupyter notebook files](lectures/makemore/makemore_part5_cnn1.ipynb) +**Supplementary links** + +* [WaveNet: A Generative Model for Raw Audio](https://arxiv.org/abs/1609.03499) +* [A Neural Probabilistic Language Model](https://www.jmlr.org/papers/volume3/bengio03a/bengio03a.pdf) + --- @@ -73,10 +102,16 @@ We build a Generatively Pretrained Transformer (GPT), following the paper "Atten - [YouTube video lecture](https://www.youtube.com/watch?v=kCc8FmEb1nY). For all other links see the video description. +**Supplementary links** + +* [Attention Is All You Need](https://arxiv.org/abs/1706.03762) +* [Language Models are Few-Shot Learners](https://arxiv.org/abs/2005.14165) +* [Introducing ChatGPT](https://openai.com/blog/chatgpt) (blog post) + --- Ongoing... **License** -MIT \ No newline at end of file +MIT