← all repositories

xorbitsai/inference

A unified inference API for deploying and serving open-source LLMs, speech, and multimodal models locally or in production.

inference
Velocity · 7d
+8.6
★ / day
Trend
steady
star history

Xinference provides a single API to run open-source language models, speech recognition models, and multimodal models on cloud, on-prem, or laptop environments. It supports multiple inference backends including vLLM, llama.cpp, GGML, and PyTorch, and offers OpenAI API-compatible endpoints. The library aims to simplify model deployment by allowing users to swap between different LLMs with minimal code changes.

heatdrop uses Google Analytics to see which pages get read — nothing else. Your call. How we handle data.