← all repositories

mostlygeek/llama-swap

A Go-based proxy server that routes requests to multiple local LLM inference backends and enables hot-swapping between models on demand.

4.4k stars Go Inference · Serving
llama-swap
Velocity · 7d
+7.2
★ / day
Trend
steady
star history

Llama-swap acts as an API gateway for local AI model servers, supporting OpenAI and Anthropic API compatible endpoints. It allows users to define multiple backends (llama.cpp, vllm, stable-diffusion.cpp, and others) and routes requests to the appropriate model based on configuration. The tool handles completions, chat completions, embeddings, audio, and image generation endpoints, providing a unified interface across different inference servers.

heatdrop uses Google Analytics to see which pages get read — nothing else. Your call. How we handle data.