← all repositories

samuel-vitorino/lm.rs

A minimal Rust implementation for running language model inference locally on CPU without external ML libraries.

lm.rs
Velocity · 7d
+1.5
★ / day
Trend
steady
star history

This project provides a lightweight LLM inference engine written in pure Rust, designed to run language models entirely on CPU. It started as a learning exercise inspired by Karpathy’s llama2.c and llm.c, gradually adding support for various model architectures including Google’s Gemma 2, Meta’s Llama 3.2, and Microsoft’s PHI-3.5-vision for multimodal (image+text) inference. The implementation supports quantized models (Q4_0, Q8_0) and includes batch processing to improve throughput.

heatdrop uses Google Analytics to see which pages get read — nothing else. Your call. How we handle data.