samuel-vitorino/lm.rs

A minimal Rust implementation for running language model inference locally on CPU without external ML libraries.

★1k stars Rust Inference · Serving Language Models

View on GitHub ↗

Velocity · 7d

+1.5

★ / day

Trend

→steady

star history

This project provides a lightweight LLM inference engine written in pure Rust, designed to run language models entirely on CPU. It started as a learning exercise inspired by Karpathy’s llama2.c and llm.c, gradually adding support for various model architectures including Google’s Gemma 2, Meta’s Llama 3.2, and Microsoft’s PHI-3.5-vision for multimodal (image+text) inference. The implementation supports quantized models (Q4_0, Q8_0) and includes batch processing to improve throughput.