OpenAI's airline bot demo shows how agent handoffs actually work
A runnable reference implementation for routing customer service requests across specialized AI agents with guardrails and a visual UI.

What it does
This is OpenAI’s official demo of a multi-agent customer service system for airlines. A Python backend orchestrates six specialist agents—triage, flight info, booking, seat services, FAQ, and refunds—using the OpenAI Agents SDK. A Next.js frontend visualizes the handoffs and provides a chat interface via ChatKit.
The interesting bit
The demo is deliberately designed to fail well. It includes explicit guardrail triggers—relevance checks that block strawberry poems, jailbreak tests that catch prompt-injection attempts—so you can watch the safety layer trip in real time. The irregular-operations flow (delayed connection, automatic rebooking, compensation) shows how multi-step agent chaining handles edge cases that break single-bot designs.
Key highlights
- Six specialized agents with defined handoff rules, not one generalist model pretending to know everything
- Visual UI shows routing decisions as they happen; useful for debugging why an agent was chosen
- Includes working guardrail examples (relevance + jailbreak) with visible failure states
- Mock data supports two complete scenarios: routine requests and a full disruption/rebooking flow
- Backend can run standalone (FastAPI/uvicorn) if you want to bring your own frontend
Caveats
- Explicitly “designed for demonstration purposes”—all flight data is mocked, not connected to live systems
- Contributing note warns that PRs may not be reviewed, so don’t expect active maintenance
Verdict
Worth cloning if you’re building multi-agent systems and need a concrete reference for handoff logic and guardrail placement. Skip it if you want production airline integrations; the value is in the architecture pattern, not the domain implementation.