← all repositories
dmlc/mshadow

The tensor library MXNet absorbed and forgot to mention

A 2010s-era C++ template library that let you write lazy GPU kernels without knowing CUDA, now frozen in amber inside Apache MXNet.

mshadow
Velocity · 7d
+0.2
★ / day
Trend
steady
star history

What it does

MShadow is a header-only C++ tensor library that compiles expression templates into CPU or CUDA kernels at build time. You write A = B + C * 2 and it generates a fused kernel—no temporary allocations, no explicit CUDA. It also shipped a parameter-server interface for multi-GPU and distributed training.

The interesting bit

The “whitebox” design: you hand it a raw float* wrapped in a Tensor struct, and the machinery takes over. No hidden memory pools, no opaque handles. In an era of PyTorch’s eager execution and TensorFlow’s graph bloat, this was almost aggressively transparent.

Key highlights

  • Lazy expression templates compile to per-expression kernels; zero temporaries
  • Single source runs on CPU and GPU without #ifdef soup
  • Extensible: custom ops plug in without CUDA knowledge
  • mshadow-ps interface unified multi-GPU and distributed training
  • Donated to Apache MXNet; repo is deprecated and read-only

Caveats

  • Deprecated since ~2017; all development moved to MXNet, which itself is now in maintenance mode
  • mshadow-2.x broke backward compatibility with 1.x, and legacy code needs pinned releases
  • README links to Travis CI (RIP) and documentation paths that may be stale

Verdict

Worth studying if you’re building a tensor library or curious about expression-template metaprogramming in C++. Not worth adopting for new work—modern alternatives (XTensor, Kokkos, or just plain PyTorch C++) have more oxygen. Historians of the DMLC ecosystem will find the missing link between CXXNet and MXNet.

heatdrop uses Google Analytics to see which pages get read — nothing else. Your call. How we handle data.