karpathy/llama2.c

A minimal 700-line C inference engine for Llama 2 language models with optional PyTorch training pipeline.

★19.6k stars C Inference · Serving Language Models

View on GitHub ↗

Velocity · 7d

+19

★ / day

Trend

→steady

star history

This project provides a self-contained C implementation for running Llama 2 model inference with no external dependencies. The repository includes both training capabilities via PyTorch (derived from nanoGPT) and a standalone C inference engine in a single file. It allows loading and running both custom-trained small Llama 2 models and Meta’s official Llama 2 model weights in fp32 format, with quantization work in progress.