Alibaba's sandbox for AI agents that won't run rm -rf / on your laptop
A general-purpose sandbox platform with multi-language SDKs and Kubernetes-native runtime for isolating coding agents, browser automation, and RL training workloads.
What it does
OpenSandbox spins up isolated environments for AI agents to run code, browse the web, or train models without touching your host system. It provides SDKs in Python, Java/Kotlin, TypeScript, C#/.NET, and Go, plus a CLI (osb) and MCP server for integration with tools like Claude Code and Cursor. Under the hood it orchestrates Docker or Kubernetes runtimes, with optional hardening via gVisor, Kata Containers, or Firecracker.
The interesting bit
The project treats “sandbox” as a protocol, not just a container. It defines lifecycle and execution APIs (OpenAPI specs in specs/) so you can plug in custom runtimes. The built-in examples are ambitiously specific: run Claude Code inside a sandbox, launch a VNC desktop environment, or do DQN CartPole training with checkpoint persistence.
Key highlights
- Multi-language SDKs with async Python support and typed file/command operations
- Kubernetes-native runtime for distributed scheduling, not just local Docker
- Built-in ingress gateway and per-sandbox egress controls for network policy
- Secure container runtime options: gVisor, Kata, Firecracker microVMs
- MCP server for direct integration with Cursor/Claude Code stdio transport
- CNCF Landscape listed; Apache 2.0 licensed
Caveats
- Local execution requires Docker and Python 3.10+; the server is Python/FastAPI-based
- README mentions “high-performance Kubernetes runtime” but provides no latency or throughput numbers
- Some example paths (e.g.,
kubernetes-sigs/agent-sandbox) reference external repos not yet fully integrated
Verdict Worth evaluating if you’re building agent infrastructure and need isolation guarantees beyond “hope the LLM doesn’t generate dangerous shell commands.” Skip if you just need a quick Python subprocess wrapper — this is heavier machinery for multi-tenant or production deployments.