← all repositories
cedrickchee/awesome-transformer-nlp

A reading list for the attention economy

Someone hand-curated the firehose of Transformer papers so you don't have to drown in BERTology alone.

awesome-transformer-nlp
Velocity · 7d
+0.4
★ / day
Trend
steady
star history

What it does This is an awesome-list repo: a manually curated index of papers, articles, tutorials, videos, and code implementations centered on Transformers, attention mechanisms, and the GPT/BERT/LLM family tree. It spans from the original “Attention Is All You Need” era through ChatGPT, Chinchilla, and RETRO.

The interesting bit The curation is opinionated enough to include Hacker News commentary on XLNet’s masking tricks and a whole “BERTology” video section, but also tracks the unglamorous infrastructure like Switch Transformers’ sparse-parameter tradeoffs and Reformer’s memory-compression math. It’s a time capsule of how the field talked to itself.

Key highlights

  • Papers section runs from BERT (2018) through Flan-T5, Chinchilla, and NPM, with inline summaries rather than bare links
  • Organized by topic: attention mechanism, architecture, GPT lineage, LLMs, reinforcement learning, plus task-specific sections (NER, QA, text generation)
  • Includes community implementations across PyTorch, TensorFlow, Keras, and even Chainer
  • Has an “AI Safety” section, which was prescient for a list started pre-ChatGPT
  • Tracks educational resources and books, not just research

Caveats

  • Curation appears to have slowed; the newest papers mentioned are from 2022, and the LLM explosion since then (Llama 2/3, Mistral, etc.) is barely visible
  • Some numbering in the papers list is inconsistent (two item 2s, two item 11s), suggesting maintenance debt
  • “Hand-curated” means hand-curated: gaps are gaps, not bugs

Verdict Worth bookmarking if you’re doing historical research or need a structured entry point to pre-2023 Transformer literature. Skip it if you need a living, automatically updated index — this is a snapshot, not a feed.

heatdrop uses Google Analytics to see which pages get read — nothing else. Your call. How we handle data.