← all repositories
deeplethe/forkd

fork(2) semantics for VMs, because AI agents need siblings

A Rust runtime that spawns 100 KVM-isolated microVM children in ~100 ms by copy-on-write forking a warmed parent snapshot.

1.8k stars Rust AgentsOther AI
forkd
Velocity · 7d
+67
★ / day
Trend
steady
star history

What it does forkd boots a Firecracker parent VM once, loads your runtime (Python heap, JIT-warmed JVM, model weights), then pauses it to disk. Each child is a separate Firecracker process that mmaps the parent’s memory image with MAP_PRIVATE; the kernel handles copy-on-write at the page level. Children share resident memory until they write, giving you per-sandbox KVM isolation without the per-sandbox cold-boot tax.

The interesting bit The BRANCH primitive lets you pause a running sandbox, snapshot its in-flight state, and resume — originally ~150 ms, now down to 56 ms p50 in v0.4 live mode. That means an agent can fork mid-thought, not just at warm-up. The v0.4 trick: move the memory copy out of the critical path using memfd-backed RAM and UFFD_WP, so the source VM pauses briefly while the copy finishes asynchronously.

Key highlights

  • 101 ms to spawn 100 sandboxes vs. 759 ms for raw Firecracker cold-boot, 122 s for Docker, 288 s for gVisor (measured on Ubuntu 24.04, 20 vCPU, 30 GiB host).
  • 0.12 MiB host memory delta per sandbox after spawn — children share the parent’s pages.
  • Per-child network namespaces, cgroup v2 memory limits, and vmgenid-reseeded /dev/urandom for multi-tenant use.
  • REST API, Python/TypeScript/MCP SDKs, Prometheus metrics, systemd unit — operable, not just a demo.
  • Apache 2.0, no vendor SDK lock-in.

Caveats

  • Requires Linux ≥ 5.7, vm.unprivileged_userfaultfd=1 (or CAP_SYS_PTRACE), and a vendored Firecracker fork — forkd doctor checks your host.
  • v0.4 live BRANCH needs the source booted with live_fork=True; CLI spawn and daemon-side spawn don’t compose yet (issue #209).
  • The 3.3 s BRANCH pause in the coding-agent demo with a 50 MiB binary is much slower than the 56 ms headline — the number depends heavily on workload and whether you use live mode.

Verdict Worth a look if you’re running AI agent fan-outs (code interpreters, tool-use sandboxes, evaluation rollouts) and currently eating the cold-start cost per request. Skip it if you’re on macOS, Windows, or an older kernel, or if your workloads are long-lived and stateless enough that boot time doesn’t matter.

heatdrop uses Google Analytics to see which pages get read — nothing else. Your call. How we handle data.