EricLBuehler/mistral.rs

mistral.rs is a Rust library for fast, flexible LLM inference with GPU acceleration, quantization, and agentic runtime support.

★7.3k stars Rust Inference · Serving Agents

View on GitHub ↗

Velocity · 7d

+8.7

★ / day

Trend

→steady

star history

mistral.rs provides high-performance LLM inference in Rust, supporting major model architectures with CUDA optimizations for NVIDIA GPUs. It offers multiple quantization formats including MXFP4 for memory efficiency, and includes an agentic runtime enabling web search, local Python code execution with model feedback, and custom tool hooks. The library exposes both OpenAI-compatible and Anthropic-compatible APIs.