jax-ml/scaling-book
A blog-style textbook explaining how to scale LLMs on TPUs, covering parallelism strategies for training and inference.

Velocity · 7d
+2.2
★ / day
Trend
→steady
star history
The book demystifies scaling LLMs on TPUs by explaining TPU architecture, how LLMs run at scale, and how to select parallelism schemes that avoid communication bottlenecks during training and inference. Written by Google DeepMind researchers, it covers technical topics including roofline analysis, tensor parallelism, and pipeline parallelism for large language model systems.