← all repositories

predibase/lorax

A framework for serving thousands of LoRA-adapted fine-tuned language models on a single GPU with dynamic adapter loading.

lorax
Velocity · 7d
+3.9
★ / day
Trend
steady
star history

LoRAX enables efficient multi-tenant inference by dynamically loading and serving fine-tuned LoRA adapters across thousands of models on a single GPU. It leverages PyTorch and HuggingFace Transformers to manage adapter weights per request, supporting HuggingFace Hub, Predibase, and local filesystem sources. The system handles concurrent requests by loading adapters just-in-time without blocking, and can merge adapters per request to create ensembles. It exposes REST API, Python client, and OpenAI-compatible interfaces for inference.

heatdrop uses Google Analytics to see which pages get read — nothing else. Your call. How we handle data.