← all repositories
trycua/cua

A full OS in a lunchbox: infrastructure for agents that actually click

Open-source sandboxes, drivers, and benchmarks for training AI agents to control real desktops across macOS, Linux, and Windows.

cua
Velocity · 7d
+36
★ / day
Trend
steady
star history

What it does

cua is a stack for building computer-use agents — the kind that see screens, move mice, and type keys. It bundles a Python SDK for spawning ephemeral VMs or containers, a background driver for native desktop automation without stealing your cursor, a benchmarking suite (OSWorld, ScreenSpot, Windows Arena), and Lume, a macOS/Linux virtualization tool built on Apple’s Virtualization.Framework. There’s even cuabot, a CLI that drops sandboxed agent windows onto your actual desktop via H.265 streaming.

The interesting bit

The SDK abstracts away the runtime entirely: Sandbox.ephemeral(Image.linux()) and Sandbox.ephemeral(Image.macos()) share the same API, so your agent code doesn’t care whether it’s talking to a local QEMU VM, a cloud instance, or a container. The background driver is the rarer piece — it lets agents drive native apps on macOS and Windows without hijacking the user’s session, which is the difference between a helpful coworker and an annoying roommate.

Key highlights

  • One async Python API for Linux, macOS, Windows, and Android sandboxes (local or cloud)
  • Background desktop automation on macOS/Windows via CLI and MCP server; Linux is pre-release
  • Built-in benchmarking with trajectory export for RL training
  • Lume spins up near-native Apple Silicon VMs from the command line
  • cuabot wraps agents in native-feeling desktop windows with shared clipboard and audio

Caveats

  • Linux background driver is pre-release and still being tested
  • Cloud BYOI (.qcow2, .iso) support is marked “soon” rather than shipped
  • Optional cua-agent[omni] pulls in ultralytics under AGPL-3.0, which may complicate commercial use

Verdict Worth a look if you’re training or evaluating computer-use agents and need cross-platform sandboxes without building your own virtualization glue. Skip it if you just want to automate a single browser — this is heavier than Playwright or Selenium by design.

heatdrop uses Google Analytics to see which pages get read — nothing else. Your call. How we handle data.