A 600 KB neural network that actually tracks objects
CFNet proves you don't need a hundred layers to follow a target through video—just a correlation filter trained end-to-end.

What it does
CFNet is a visual object tracker for video. You mark an object in the first frame; it keeps finding that object as it moves, scales, and occludes. The twist is the architecture: a bare-bones two-layer network with a correlation filter layer baked in, trained from scratch rather than bolted on after the fact.
The interesting bit
The authors took a classic tracking trick—correlation filters, which are basically template matching in the frequency domain—and made it differentiable so the whole pipeline learns together. The payoff is a model smaller than a JPEG thumbnail that still runs at “fast speed” (their words, not a benchmark) on a GPU.
Key highlights
- End-to-end training of correlation filters, not the usual hand-tuned post-processing
- Pretrained networks available; you can skip training entirely if you just want to track
- MatConvNet-based, so the deep learning happens inside MATLAB
- CVPR 2017 paper behind it; this is research code, not a product
Caveats
- Locked to a very specific stack: MATLAB 2015, MatConvNet beta24, CUDA 8.0, cuDNN 5.1. The README explicitly warns that MATLAB 2017 breaks things.
- Setting up requires manual path editing, dataset curation, and downloading ~7 GB of metadata if you train yourself.
- No Python, no PyTorch, no modern framework migration in sight.
Verdict
Worth a look if you’re researching lightweight trackers or writing a literature review on siamese/correlation methods. Skip it if you need something production-ready or if your GPU drivers are newer than 2016.