Self-hosted AI platform that actually ships the boring parts
Onyx wraps chat, RAG, agents, and enterprise governance into one deployable stack with a one-line installer.

What it does Onyx is a self-hostable “application layer for LLMs” — essentially a full-stack AI workbench you run yourself. It bundles chat, retrieval-augmented generation, custom agents, web search, code execution, voice, and image generation behind a single interface. The project also offers a slimmed-down “Lite” mode (under 1GB RAM) for teams that just want chat and agents without the indexing infrastructure.
The interesting bit
The deployment story is unusually practical. One curl | bash gets you running, and the stack scales up to Kubernetes/Helm/Terraform with Redis, MinIO, and background job workers. The README is refreshingly direct about the split: Community Edition is MIT-licensed and covers core features, while Enterprise Edition adds SSO, SCIM, RBAC, analytics, and whitelabeling. No hand-waving about “open core” — the boundary is explicit.
Key highlights
- 50+ built-in connectors for indexing external data, plus MCP support for custom integrations
- “Agentic RAG” with hybrid vector + keyword indexing; claims top spot on a deep-research benchmark as of Feb 2026 (benchmark repo linked, though methodology not detailed)
- Supports essentially every LLM backend: Ollama, vLLM, LiteLLM, Anthropic, OpenAI, Gemini, etc.
- Code execution in sandboxed environments, artifact generation, and web crawling via in-house crawler or Firecrawl/Exa
- Enterprise features: SAML/OIDC SSO, SCIM provisioning, query audit history, and custom PII-filtering code hooks
Caveats
- The “best in class” RAG claim references a benchmark “to release soon” — currently unverified
- Deep research benchmark leadership is self-reported from Feb 2026; no independent confirmation in the README
- Standard mode requires significant infrastructure (vector DB, job queues, inference servers) — the Lite/Standard split is real and not just marketing
Verdict Teams that need a governed, self-hosted alternative to ChatGPT Enterprise or Copilot should evaluate this seriously. Solo developers or those already happy with a simple Ollama + Open WebUI setup will find the full stack overkill unless they need the connector ecosystem.