Lightning-AI/lit-llama
Independent open-source implementation of the LLaMA language model built on nanoGPT.

Velocity · 7d
+5.2
★ / day
Trend
→steady
star history
This repository provides a complete implementation of LLaMA pretraining, fine-tuning, and inference code under Apache 2.0. It supports flash attention for efficient computation, Int8 and GPTQ 4-bit quantization for reduced memory footprint, and adapter-based fine-tuning methods including LoRA and LLaMA-Adapter. The implementation is built on top of nanoGPT and enables both model training from scratch and fine-tuning of existing LLaMA weights.