samuel-vitorino/lm.rs
A minimal Rust implementation for running language model inference locally on CPU without external ML libraries.

This project provides a lightweight LLM inference engine written in pure Rust, designed to run language models entirely on CPU. It started as a learning exercise inspired by Karpathy’s llama2.c and llm.c, gradually adding support for various model architectures including Google’s Gemma 2, Meta’s Llama 3.2, and Microsoft’s PHI-3.5-vision for multimodal (image+text) inference. The implementation supports quantized models (Q4_0, Q8_0) and includes batch processing to improve throughput.