xorbitsai/inference
A unified inference API for deploying and serving open-source LLMs, speech, and multimodal models locally or in production.

Velocity · 7d
+8.6
★ / day
Trend
→steady
star history
Xinference provides a single API to run open-source language models, speech recognition models, and multimodal models on cloud, on-prem, or laptop environments. It supports multiple inference backends including vLLM, llama.cpp, GGML, and PyTorch, and offers OpenAI API-compatible endpoints. The library aims to simplify model deployment by allowing users to swap between different LLMs with minimal code changes.