Is DeepBench open source?

Yes — baidu-research/DeepBench is open source, released under the Apache-2.0 license.

What language is DeepBench written in?

baidu-research/DeepBench is primarily written in C++.

How popular is DeepBench?

baidu-research/DeepBench has 1.1k stars on GitHub.

Where can I find DeepBench?

baidu-research/DeepBench is on GitHub at https://github.com/baidu-research/DeepBench.

← all repositories

baidu-research/DeepBench

Baidu's low-level stress test for AI hardware

A benchmarking suite that measures the raw ingredients of deep learning—GEMMs, convolutions, and all-reduce—rather than full models.

★1.1k stars C++ LLMOps · Eval

View on GitHub ↗

Not currently ranked — collecting fresh signals.

star history

What it does DeepBench benchmarks the fundamental operations that underpin deep learning—dense matrix multiplies, convolutions, recurrent layers, and all-reduce communication—across different hardware platforms. It deliberately ignores frameworks and end-to-end models, focusing instead on the low-level kernels that hardware vendors and simulator builders actually need to optimize.

The interesting bit The project treats deep learning performance as a decomposable problem. Rather than benchmarking “ResNet on GPU A vs GPU B,” it asks: given specific matrix dimensions and convolution parameters, which hardware and library combination wins? The README includes detailed topology diagrams for multi-GPU systems and specifies exact precision requirements, making it usable as a hardware simulator input.

Key highlights

Covers training and inference with separate Excel spreadsheets defining all problem sizes (DeepBenchKernels_train.xlsx and DeepBenchKernels_inference.xlsx)
Tests seven training platforms (NVIDIA TitanX through P100, plus Intel Knights Landing) and six inference platforms including mobile (iPhone 6/7, Raspberry Pi 3)
Evaluates all-reduce using four different communication libraries (NCCL, OSU, Baidu’s own allreduce, Intel MLSL) and reports the best latency per configuration
Uses only vendor-supplied libraries, accepting that published faster implementations exist but aren’t what most users actually run
Includes detailed hardware topology schematics for 8-GPU and 10-GPU NVIDIA systems

Caveats

The README is truncated mid-sentence in the 10 GPU system topology section, leaving that documentation incomplete
Recurrent layer benchmarks explicitly exclude input-to-hidden calculations and input gradients, so they measure only a subset of real recurrent layer work
No support for asynchronous distributed training methods in the all-reduce benchmark

Verdict Hardware engineers, simulator developers, and anyone building custom AI silicon should bookmark this. If you’re choosing between cloud GPU instances based on end-to-end training cost, look elsewhere—this won’t tell you how fast your actual model trains.

Frequently asked

What is baidu-research/DeepBench?: A benchmarking suite that measures the raw ingredients of deep learning—GEMMs, convolutions, and all-reduce—rather than full models.
Is DeepBench open source?: Yes — baidu-research/DeepBench is open source, released under the Apache-2.0 license.
What language is DeepBench written in?: baidu-research/DeepBench is primarily written in C++.
How popular is DeepBench?: baidu-research/DeepBench has 1.1k stars on GitHub.
Where can I find DeepBench?: baidu-research/DeepBench is on GitHub at https://github.com/baidu-research/DeepBench.