intentee/paddler
An open-source load balancer and serving platform for running LLMs and VLMs on your own infrastructure using the llama.cpp engine.

Paddler is an LLM load balancer and serving platform that enables self-hosted inference, deployment, and scaling of large language models. It includes a built-in llama.cpp engine for inference, LLM-specific load balancing, request buffering for scale-from-zero, dynamic model swapping, and a web admin panel for management and monitoring. Organizations can use it to maintain privacy, cost control, and independence from closed-source model providers while running LLMs on CPU or GPU.