Turn your Mac pile into an AI supercomputer (seriously)
exo automatically clusters your Apple devices to run frontier models that won't fit on one machine, using Thunderbolt like a datacenter backplane.

What it does
exo turns every device you own into a single distributed inference cluster. It discovers peers automatically, shards models across them, and exposes standard APIs (OpenAI, Claude, Ollama) so your existing tools just work. The pitch is simple: pool your hardware, run bigger models.
The interesting bit
The project treats Thunderbolt 5 like a datacenter RDMA fabric — because on macOS 26.2, it basically is. exo claims a 99% latency reduction between devices and published benchmarks running DeepSeek v3.1 671B and Qwen3-235B across four M3 Ultra Mac Studios. That’s not hobbyist territory; that’s using consumer cables to do what normally requires InfiniBand.
Key highlights
- Zero-config clustering: Devices find each other automatically; no manual topology files or IP lists.
- RDMA over Thunderbolt 5: Ships with day-0 support, but requires macOS 26.2 and a trip to Recovery mode to run
rdma_ctl enable. - Topology-aware sharding: Splits models based on realtime device resources and link latency/bandwidth, not naive round-robin.
- MLX backend: Built on Apple’s MLX and MLX distributed; tensor parallelism claims 1.8× on 2 devices, 3.2× on 4.
- Dashboard + multi-API: Built-in web UI at
:52415, plus compatibility with Chat Completions, Messages, Responses, and Ollama APIs.
Caveats
- Mac-first, Linux-later: Linux currently runs CPU-only; GPU support is explicitly “under development.”
- Setup friction: macOS source builds need Xcode, Rust nightly, a pinned
macmonfork, and Node for the dashboard — or you can use Nix to skip most of it. - RDMA hardware gate: Thunderbolt 5 only; limited to M4 Pro/Max machines and M3 Ultra Mac Studio.
- macOS app requires Tahoe 26.2: The standalone app won’t run on older macOS versions and needs network profile permissions.
Verdict
If you have multiple Apple Silicon machines and want to run 400B+ parameter models without renting GPUs, exo is the most credible attempt at consumer-grade AI clustering. Everyone else — especially Linux users with NVIDIA cards — should wait for broader hardware support.