Michael-A-Kuykendall/shimmy
A Python-free Rust inference server that provides an OpenAI-compatible API for running GGUF and SafeTensors language models locally.

Velocity · 7d
+19
★ / day
Trend
→steady
star history
Shimmy is a lightweight inference server written in Rust that serves large language models through an OpenAI-compatible REST API. It supports GGUF and SafeTensors model formats, integrates with llama.cpp and Hugging Face transformers, and offers features like hot model swapping, automatic model discovery, and runs as a single standalone binary without Python dependencies.