Is inference open source?

Yes — xorbitsai/inference is open source, released under the Apache-2.0 license.

What language is inference written in?

xorbitsai/inference is primarily written in Python.

How popular is inference?

xorbitsai/inference has 9.4k stars on GitHub.

Where can I find inference?

xorbitsai/inference is on GitHub at https://github.com/xorbitsai/inference.

← all repositories

xorbitsai/inference

One API to Run LLMs, Speech, and Multimodal Models Anywhere

It exists so you can swap GPT for open-source, speech, and multimodal models by changing a single line of client code.

★9.4k stars Python Inference · Serving Language Models

View on GitHub ↗ Homepage ↗

Not currently ranked — collecting fresh signals.

star history

What it does

Xinference is a model serving layer that exposes language, speech, and multimodal models through a unified, OpenAI-compatible RESTful API. You point existing clients at it and swap models without touching your application code. Under the hood it handles hardware scheduling, engine selection, and distributed workers.

The interesting bit

Most serving stacks pick a lane—text-only, GPU-only, or single-node. Xinference is aggressively pluralistic: it bundles vLLM, GGML, and TensorRT, runs on GPUs, CPUs, and Metal, and scales from a laptop to a multi-node cluster. The README maintains a self-reported comparison table showing competitors like FastChat and OpenLLM missing most of these combinations.

Key highlights

OpenAI-compatible API including function calling, so existing clients drop in with a URL change.
Supports heterogeneous hardware (GPU, CPU, Metal) and multiple inference engines (vLLM, GGML, TensorRT).
Distributed inference across workers, with auto-batching and shared KV cache for throughput.
Built-in model zoo covering recent releases (DeepSeek V4, GLM-5.1, Qwen3.6, Gemma-4, etc.).
Integrations with agent frameworks (Xagent) and LLMOps platforms (Dify, FastGPT, RAGFlow, MaxKB).

Verdict

Worth a look if you need OpenAI API semantics but want to run open-source, multimodal, or speech models on your own hardware. Skip it if you’re already satisfied with a single-engine, single-node stack like raw vLLM.

Frequently asked

What is xorbitsai/inference?: It exists so you can swap GPT for open-source, speech, and multimodal models by changing a single line of client code.
Is inference open source?: Yes — xorbitsai/inference is open source, released under the Apache-2.0 license.
What language is inference written in?: xorbitsai/inference is primarily written in Python.
How popular is inference?: xorbitsai/inference has 9.4k stars on GitHub.
Where can I find inference?: xorbitsai/inference is on GitHub at https://github.com/xorbitsai/inference.