An FPGA CNN accelerator that admits it's past its prime
A rare honest README: this OpenCL-based CNN accelerator won't beat state-of-the-art, but it'll teach you how hardware acceleration actually works.

What it does
PipeCNN compiles convolutional neural network inference kernels to FPGA bitstreams using OpenCL and High-Level Synthesis. You write C-like OpenCL kernels; Intel’s SDK or Xilinx Vitis turns them into RTL, then into a running FPGA design. It ships with pre-quantized VGG-16 and ResNet-50 models and a ModelZoo of weights and test vectors.
The interesting bit
The authors openly state the project is “no longer comparable to the state-of-the-art designs” after four years of DLA evolution. Rather than chase benchmarks, they’ve pivoted to educational value: a complete, working HLS pipeline you can actually modify. That’s unusual honesty in accelerator research, where papers typically pretend last year’s work is still competitive.
Key highlights
- Supports both Intel OpenCL SDK Pro 20.1 and Xilinx Vitis 2020.1
- Tested on Arria-10, Zynq, and Alveo U50 boards; may work on DE10-nano and Ultra96-v2 (unverified)
- Pipelined CNN kernels with tunable parameters (VEC_SIZE, LANE_NUM, CONV_GP_SIZE_X) for per-board optimization
- Includes ImageNet classification demo with OpenCV integration
- Spawned follow-up research including a sparse-convolution variant (DAC 2019)
Caveats
- Performance table in the README is blank (all “–”), so you get no hard numbers for comparison
- Requires specific, somewhat dated tool versions; newer releases may need massaging
- The authors themselves note the design is behind current state-of-the-art
Verdict
Grab this if you’re learning FPGA-based deep learning acceleration or need a hackable HLS baseline to test new ideas. Skip it if you need production throughput today—modern DLA IP or even a GPU will run circles around it.