← all repositories
exo-explore/exo

Turn your Mac pile into an AI supercomputer (seriously)

exo automatically clusters your Apple devices to run frontier models that won't fit on one machine, using Thunderbolt like a datacenter backplane.

45.2k stars Python Inference · ServingML Frameworks
exo
Velocity · 7d
+63
★ / day
Trend
steady
star history

What it does

exo turns every device you own into a single distributed inference cluster. It discovers peers automatically, shards models across them, and exposes standard APIs (OpenAI, Claude, Ollama) so your existing tools just work. The pitch is simple: pool your hardware, run bigger models.

The interesting bit

The project treats Thunderbolt 5 like a datacenter RDMA fabric — because on macOS 26.2, it basically is. exo claims a 99% latency reduction between devices and published benchmarks running DeepSeek v3.1 671B and Qwen3-235B across four M3 Ultra Mac Studios. That’s not hobbyist territory; that’s using consumer cables to do what normally requires InfiniBand.

Key highlights

  • Zero-config clustering: Devices find each other automatically; no manual topology files or IP lists.
  • RDMA over Thunderbolt 5: Ships with day-0 support, but requires macOS 26.2 and a trip to Recovery mode to run rdma_ctl enable.
  • Topology-aware sharding: Splits models based on realtime device resources and link latency/bandwidth, not naive round-robin.
  • MLX backend: Built on Apple’s MLX and MLX distributed; tensor parallelism claims 1.8× on 2 devices, 3.2× on 4.
  • Dashboard + multi-API: Built-in web UI at :52415, plus compatibility with Chat Completions, Messages, Responses, and Ollama APIs.

Caveats

  • Mac-first, Linux-later: Linux currently runs CPU-only; GPU support is explicitly “under development.”
  • Setup friction: macOS source builds need Xcode, Rust nightly, a pinned macmon fork, and Node for the dashboard — or you can use Nix to skip most of it.
  • RDMA hardware gate: Thunderbolt 5 only; limited to M4 Pro/Max machines and M3 Ultra Mac Studio.
  • macOS app requires Tahoe 26.2: The standalone app won’t run on older macOS versions and needs network profile permissions.

Verdict

If you have multiple Apple Silicon machines and want to run 400B+ parameter models without renting GPUs, exo is the most credible attempt at consumer-grade AI clustering. Everyone else — especially Linux users with NVIDIA cards — should wait for broader hardware support.

heatdrop uses Google Analytics to see which pages get read — nothing else. Your call. How we handle data.