← all repositories
openlake-project/openlake

An object store that treats the CPU as overhead

OpenLake wants storage to bypass the host entirely and land straight in GPU memory.

openlake
Velocity · 7d
+30
★ / day
Trend
steady
star history

What it does OpenLake is a distributed S3-compatible object store written in Rust, built for the specific misery of feeding GPUs during LLM training and inference. It uses io_uring, pins one runtime per core with no work stealing, and runs the HTTP frontend and storage engine on the same thread so requests never cross core boundaries on the hot path.

The interesting bit The real bet is zerocopy: GPUDirect Storage and RDMA move data from peer NIC straight into GPU VRAM, skipping host memory and the page cache entirely. The README also claims a novel congestion control algorithm called “PacedRDMA” and SIMD-accelerated Reed-Solomon erasure coding. Whether the 6× throughput claim holds outside their own benchmarks is, naturally, their claim to verify.

Key highlights

  • S3-compatible API; works with standard aws CLI
  • Built on compio completion-based runtime, Rust 1.91+
  • io_uring on Linux, kqueue fallback on macOS for dev builds
  • Includes a local benchmark CLI (openlake bench) for quick smoke testing
  • Cluster config is plain TOML, one file per node

Caveats

  • The benchmark chart in the README is self-reported against MinIO and “RustFS” (likely typo for JuiceFS or similar) with no independent validation visible
  • “Million+ IOPS within 1ms” and “6× higher throughput” are marketing-adjacent claims without disclosed test conditions
  • GPUDirect Storage and RDMA require specific NVIDIA hardware and network setup; this is not a drop-in replacement for a standard S3 endpoint on commodity cloud instances

Verdict Worth watching if you’re running multi-node GPU clusters with InfiniBand or NVLink and your storage layer is actually the bottleneck. For a single-node setup or generic object storage needs, this is overkill with hardware lock-in.

heatdrop uses Google Analytics to see which pages get read — nothing else. Your call. How we handle data.