← all repositories

Michael-A-Kuykendall/shimmy

A Python-free Rust inference server that provides an OpenAI-compatible API for running GGUF and SafeTensors language models locally.

5.3k stars Rust Inference · Serving
shimmy
Velocity · 7d
+19
★ / day
Trend
steady
star history

Shimmy is a lightweight inference server written in Rust that serves large language models through an OpenAI-compatible REST API. It supports GGUF and SafeTensors model formats, integrates with llama.cpp and Hugging Face transformers, and offers features like hot model swapping, automatic model discovery, and runs as a single standalone binary without Python dependencies.

heatdrop uses Google Analytics to see which pages get read — nothing else. Your call. How we handle data.