MLPerf's duct tape: Python scripts that bench anything
A community-built automation framework trying to make ML benchmarking reproducible across the chaos of GPUs, containers, and constantly shifting software stacks.

What it does
CK/CM/CMX (the naming is a journey) is a Python-based automation framework for running MLPerf benchmarks and other AI workloads across diverse hardware and software. It wraps experiments in portable, file-based “artifacts” with JSON/YAML metadata, then chains them into reusable workflows via a common CLI and Python API. Think of it as make for ML benchmarking, if make had to handle CUDA versions, container images, and paper reproducibility requirements.
The interesting bit
The core insight is non-intrusive modularization: you don’t rewrite your project, you annotate it. CM scripts extend cmake’s concept with Python automations and metadata, letting the community continuously extend support for new models, datasets, and hardware without central bottlenecks. The newer CMX interface (2025+) promises simpler commands while maintaining backward compatibility—cm becomes cmx, mlcr becomes cmlcr, etc.
Key highlights
- Automates MLPerf inference benchmarks across Ubuntu, macOS, Windows, RHEL, Debian, Amazon Linux, cloud, and containers
- Two core automation types: script (portable execution recipes) and cache (reusable artifact storage)
- Online catalogs index community automations: CK Playground and MLCommons docs
- Supports artifact evaluation for academic conferences and hosts reproducibility challenges
- Installable via
pip install cmind(includes both legacy CM and newer CMX);pip install cmx4mlperffor MLPerf-specific automations - Apache 2.0, originally created by Grigori Fursin, donated to MLCommons by cTuning foundation and OctoML
Caveats
- The project is explicitly labeled “legacy” by MLCommons; next-generation development moved to cTuning.ai by the original author
- Multiple overlapping names and APIs (CK, CM, CMX, CM-MLOps, CM4MLOps, CMX4MLOps, MLC, mlcr…) create genuine confusion about what to use when
- README is heavy on vision and light on concrete getting-started examples
Verdict
Worth exploring if you’re submitting to MLPerf or need to make experiments reproducible across heterogeneous environments. Skip if you want a polished, single-vendor MLOps platform—this is community-driven glue code with academic roots, not a product.