lemonade-sdk/lemonade
A local AI server that runs open-source LLMs (Llama, Mistral, Qwen) on user-owned GPUs and NPUs with OpenAI API compatibility.

Lemonade is a local AI inference server that enables running LLMs entirely on the user’s own hardware. It supports a range of open-source models including Llama, Mistral, and Qwen for tasks like chat, coding, speech, and image generation. The system optimizes inference for AMD and NVIDIA GPUs as well as NPUs like Ryzen AI using engines such as ONNX Runtime and vLLM, providing OpenAI-compatible APIs so existing applications can connect without code changes.