fork(2) semantics for VMs, because AI agents need siblings
A Rust runtime that spawns 100 KVM-isolated microVM children in ~100 ms by copy-on-write forking a warmed parent snapshot.

What it does
forkd boots a Firecracker parent VM once, loads your runtime (Python heap, JIT-warmed JVM, model weights), then pauses it to disk. Each child is a separate Firecracker process that mmaps the parent’s memory image with MAP_PRIVATE; the kernel handles copy-on-write at the page level. Children share resident memory until they write, giving you per-sandbox KVM isolation without the per-sandbox cold-boot tax.
The interesting bit
The BRANCH primitive lets you pause a running sandbox, snapshot its in-flight state, and resume — originally ~150 ms, now down to 56 ms p50 in v0.4 live mode. That means an agent can fork mid-thought, not just at warm-up. The v0.4 trick: move the memory copy out of the critical path using memfd-backed RAM and UFFD_WP, so the source VM pauses briefly while the copy finishes asynchronously.
Key highlights
- 101 ms to spawn 100 sandboxes vs. 759 ms for raw Firecracker cold-boot, 122 s for Docker, 288 s for gVisor (measured on Ubuntu 24.04, 20 vCPU, 30 GiB host).
- 0.12 MiB host memory delta per sandbox after spawn — children share the parent’s pages.
- Per-child network namespaces, cgroup v2 memory limits, and
vmgenid-reseeded/dev/urandomfor multi-tenant use. - REST API, Python/TypeScript/MCP SDKs, Prometheus metrics, systemd unit — operable, not just a demo.
- Apache 2.0, no vendor SDK lock-in.
Caveats
- Requires Linux ≥ 5.7,
vm.unprivileged_userfaultfd=1(orCAP_SYS_PTRACE), and a vendored Firecracker fork —forkd doctorchecks your host. - v0.4 live BRANCH needs the source booted with
live_fork=True; CLI spawn and daemon-side spawn don’t compose yet (issue #209). - The 3.3 s BRANCH pause in the coding-agent demo with a 50 MiB binary is much slower than the 56 ms headline — the number depends heavily on workload and whether you use live mode.
Verdict Worth a look if you’re running AI agent fan-outs (code interpreters, tool-use sandboxes, evaluation rollouts) and currently eating the cold-start cost per request. Skip it if you’re on macOS, Windows, or an older kernel, or if your workloads are long-lived and stateless enough that boot time doesn’t matter.