← all repositories
Lagrange-Labs/deep-prove

ZK proofs for LLM inference that don't take a geological epoch

DeepProve generates cryptographic proofs of transformer forward passes in minutes rather than hours, using sumcheck-based techniques instead of circuit compilation.

3.4k stars Rust Inference · ServingOther AI
deep-prove
Velocity · 7d
+5.8
★ / day
Trend
steady
star history

What it does

DeepProve proves that a neural network inference ran correctly — token embeddings through argmax — and produces a compact zero-knowledge proof anyone can verify. It handles GPT-2, Gemma 3, and Llama 2 end-to-end, plus MLPs and CNNs. The catch: proving still takes 7–19 minutes for a 512-token sequence on a hefty 24-core/504 GB box, so this is for trust, not real-time serving.

The interesting bit

The speedup comes from refusing to compile models into circuits at all. DeepProve uses sumchecks and logup GKR, achieving sublinear proving time in model size — the authors claim 10–30× over prior work like zkGPT. The 12-bit quantization keeps ≥99.6% cosine similarity to float baselines, which is the boring part that actually matters for utility.

Key highlights

  • Proves full transformer stacks end-to-end, not just isolated layers
  • 1.12 tokens/s proving throughput for GPT-2, 0.45 tokens/s for Gemma 3 — slow, but prior art was ~0.05 tokens/s
  • Rust workspace with clean separation: zkml for proving, deep-prove for client/server job distribution, tenstore for tensor persistence
  • GPU acceleration and horizontal distribution supported now; clustered GPU workers promised
  • Builds on scroll-tech/ceno’s sumcheck/GKR implementation rather than reinventing

Caveats

  • The “DeepProve paper” is cited but not yet linked; methodology details are pending
  • Lagrange License is custom, not standard open-source — check terms before shipping anything
  • 504 GB RAM for CPU proving is a serious hardware bar; GPU path exists but details are in the secondary README

Verdict

Worth a look if you’re building verifiable AI pipelines, model marketplaces, or compliance tooling where “trust me, I ran this” isn’t enough. Skip it if you just need fast inference — the proving overhead is still measured in minutes per sequence.

heatdrop uses Google Analytics to see which pages get read — nothing else. Your call. How we handle data.