karpathy/makemore
An educational autoregressive language model that trains character-level neural networks from bigram to Transformer architectures.

Velocity · 7d
+2.7
★ / day
Trend
→steady
star history
The project implements multiple neural network architectures for character-level language modeling, following seminal papers including the Transformer architecture from Vaswani et al. 2017. It trains on text data to generate new examples similar to the training set, supporting bigrams, MLP, RNN, LSTM, GRU, and Transformer models. Built with PyTorch as the sole dependency and designed primarily for educational demonstration purposes.