← all repositories
diegosouzapw/OmniRoute

A load balancer for your AI bills

One localhost endpoint that swaps between 177 LLM providers so your coding agent never stops when a quota dries up.

5.8k stars TypeScript Inference · ServingCoding Assistants
OmniRoute
Velocity · 7d
+51
★ / day
Trend
steady
star history

What it does OmniRoute sits between your IDE or CLI and a small army of AI providers. You point Claude Code, Cursor, Cline, Copilot, or Codex at http://localhost:20128/v1, and it handles the translation, fallback, and token accounting. When your paid tier hits a limit, it silently slides down a four-tier chain — subscription → API key → cheap → free — without dropping the request.

The interesting bit The “combo” system is the core mechanic: you can chain models into a priority list, or just set auto and let it score providers live by latency, cost, or quota headroom. The README also pitches “RTK + Caveman” stacked compression, which claims 15–95% token savings on tool-heavy sessions like git diff or log dumps. That is where the budget math actually happens.

Key highlights

  • 177 providers configured, 50+ with free tiers, 11 advertised as free-forever
  • 14 routing strategies including round-robin, weighted, cost-optimized, and least-used
  • MCP (37 tools) and A2A support for agent-to-agent workflows
  • Desktop app, PWA, Docker image, and npm package (omniroute)
  • Claims 4,690+ tests, circuit breakers, and TLS fingerprint stealth for region-blocked users

Caveats

  • The 15–95% compression figure is self-reported with no independent benchmark cited
  • “177 providers” likely counts every configured endpoint, not all actively maintained or equally reliable
  • The README is heavy on emoji and light on architecture docs; you will need to dig for setup edge cases

Verdict Worth a look if you are running multiple AI coding agents and tired of manually swapping API keys when quotas expire. Skip it if you already have a single provider contract that covers your usage — the complexity only pays off when you are juggling tiers.

heatdrop uses Google Analytics to see which pages get read — nothing else. Your call. How we handle data.