← all repositories
Maratyszcza/NNPACK

The engine under PyTorch's mobile hood

NNPACK is the low-level CPU library that makes neural networks run fast on phones and laptops without a GPU.

NNPACK
Velocity · 7d
+0.5
★ / day
Trend
steady
star history

What it does NNPACK provides optimized implementations of core neural network layers—convolution, fully-connected, pooling, ReLU, softmax—for multi-core CPUs. It’s written in C99 with no external dependencies and targets x86-64 (AVX2), ARM (NEON), and even WebAssembly. You probably don’t use it directly; frameworks like PyTorch, MXNet, and Caffe2 call it under the hood.

The interesting bit The library picks its algorithm based on kernel size: Fourier transform for kernels up to 16×16, Winograd for 3×3, direct for 1×1, and implicit GEMM for everything else. It’s the kind of tedious optimization work that makes mobile inference feel less like a compromise.

Key highlights

  • Powers PyTorch mobile inference and Facebook’s production workloads
  • Supports both training (forward/backward) and inference-optimized paths
  • FP16 weight support for fully-connected layers
  • Cross-compiles for Android, iOS, and Emscripten
  • Extensive unit test coverage; builds via CMake or vcpkg

Caveats

  • No Windows support officially; community port exists
  • x86_64 cross-compiles for Android use SSE2 instead of AVX2
  • armeabi builds are up to 2× slower with clang; gcc recommended
  • mips/mips64 explicitly not supported

Verdict Worth studying if you write performance-critical CPU kernels or ship models to mobile. Not for researchers who want a friendly API—this is strictly a foundation layer.

heatdrop uses Google Analytics to see which pages get read — nothing else. Your call. How we handle data.