karpathy/llama2.c
A minimal 700-line C inference engine for Llama 2 language models with optional PyTorch training pipeline.

Velocity · 7d
+19
★ / day
Trend
→steady
star history
This project provides a self-contained C implementation for running Llama 2 model inference with no external dependencies. The repository includes both training capabilities via PyTorch (derived from nanoGPT) and a standalone C inference engine in a single file. It allows loading and running both custom-trained small Llama 2 models and Meta’s official Llama 2 model weights in fp32 format, with quantization work in progress.