One OpenAI-shaped hole for 100+ LLMs
LiteLLM is the adapter layer that stops your codebase from fracturing across a dozen provider SDKs.

What it does LiteLLM gives you a single OpenAI-compatible interface to call over 100 LLM providers — Anthropic, Bedrock, Azure, Gemini, Vertex, and the rest — either as a Python SDK or a self-hosted proxy server. You write one client; LiteLLM handles auth, request translation, and response normalization behind the scenes.
The interesting bit The proxy mode turns LiteLLM into an actual AI Gateway: virtual API keys, spend tracking, guardrails, load balancing, and an admin dashboard all ship out of the box. It also speaks newer protocols — A2A agents and MCP tools — so you can route not just model calls but agent-to-agent traffic and tool use through the same chokepoint.
Key highlights
- Drop-in OpenAI compatibility: swap
gpt-4oforanthropic/claude-sonnet-4-20250514without touching client code - 8ms P95 latency at 1k RPS, per their benchmarks
- Supports
/chat/completions,/embeddings,/images,/audio,/batches,/rerank,/a2a,/messages, and more - MCP gateway mode lets Cursor IDE and other clients consume MCP servers through the proxy
- OSS adopters include Stripe, Netflix, and OpenAI’s own Agents SDK
Caveats
- The 8ms latency claim is self-reported; verify against your own load patterns
- MCP client support is marked
experimental_mcp_client— API may shift - Enterprise features (hosted proxy, SSO, etc.) sit behind a paid tier; the README funnels you toward it aggressively
Verdict If you’re running multi-provider LLM workloads in production and tired of maintaining N different client libraries, this is close to a default choice. Skip it if you’re locked to a single provider and don’t need gateway features — it’s overkill for a one-model shop.