LLMs without the framework soup
LitGPT re-implements 20+ models from scratch so you can actually read the code.

What it does LitGPT is a collection of 20+ large language models—Llama, Qwen, Phi, Gemma, Mistral, and more—each implemented from scratch in plain PyTorch. It bundles pretraining, finetuning (LoRA, QLoRA, Adapters), and deployment recipes, plus FSDP and quantization support for running on anything from a single GPU to clusters.
The interesting bit
The selling point is deliberately no selling point: no abstraction layers, no framework magic. Every model lives in a single readable file. If you’ve ever debugged Hugging Face internals and wanted to torch the entire modeling_*.py stack, this is the anti-framework.
Key highlights
- 20+ models, 300M to 405B parameters, all hand-rolled in PyTorch
- Flash Attention, FSDP, and mixed-precision (fp4/8/16/32) built in
- One-liner Python API:
LLM.load("microsoft/phi-2")thengenerate() - YAML recipes for training workflows tested at “enterprise scale” (their claim)
- Apache 2.0, so no licensing hand-wringing for commercial use
Caveats
- The README is heavy on Lightning Cloud upsell; GPU rental prices and “vibe train” features clutter the docs
- “No abstractions” is great until you need to swap model architectures quickly—then you’re hand-editing files again
- Benchmark numbers against transformers/vLLM aren’t shown in the README, so performance claims are unverified in the source
Verdict
Grab this if you teach, research, or debug LLM internals and want code you can grep. Skip it if you need drop-in ecosystem compatibility (LoRA adapters from Hugging Face, etc.) or just want pipeline() to work yesterday.