← all repositories
mlcommons/inference

The benchmark suite that keeps AI hardware honest

Reference implementations for MLPerf Inference, the industry-standard test for measuring how fast systems actually run models in production scenarios.

inference
Velocity · 7d
+0.6
★ / day
Trend
steady
star history

What it does MLPerf Inference provides reference implementations for standardized ML benchmarks across vision, language, speech, recommendation, and multimodal tasks. It measures inference performance in two strict categories: “edge” (constrained environments) and “datacenter” (unconstrained). The suite covers everything from ResNet-50 and YOLO to Llama 3.1-405B, DeepSeek-R1, and GPT-OSS 120B.

The interesting bit The benchmark is deliberately adversarial: submitters can use any framework or optimization they want, but must hit the same accuracy targets on the same datasets. This creates a rare apples-to-apples comparison in an industry drowning in cherry-picked marketing numbers. The reference implementations here are the baseline that vendors try to beat.

Key highlights

  • 15+ model benchmarks in v6.0, including text-to-video (Wan2.2) and vision-language models (Qwen3-VL)
  • Strict versioning with seed releases and submission deadlines; v6.0 deadline is February 13, 2026
  • Power submissions require SPEC PTD 1.11.1 and special repository access
  • Reference implementations span TensorFlow, PyTorch, ONNX, TVM, and ncnn
  • Separate edge and datacenter categories with different latency and throughput constraints

Caveats

  • README is essentially a changelog of benchmark tables; setup and running instructions live elsewhere
  • Power measurement requires external SPEC tooling with restricted access
  • The 5.1 seed release link is broken (empty parentheses in the markdown)

Verdict Essential if you’re submitting MLPerf results or evaluating hardware claims. Skip it if you just want to run models quickly—this is compliance and competition infrastructure, not convenience tooling.

heatdrop uses Google Analytics to see which pages get read — nothing else. Your call. How we handle data.