mostlygeek/llama-swap
A Go-based proxy server that routes requests to multiple local LLM inference backends and enables hot-swapping between models on demand.

Llama-swap acts as an API gateway for local AI model servers, supporting OpenAI and Anthropic API compatible endpoints. It allows users to define multiple backends (llama.cpp, vllm, stable-diffusion.cpp, and others) and routes requests to the appropriate model based on configuration. The tool handles completions, chat completions, embeddings, audio, and image generation endpoints, providing a unified interface across different inference servers.