Object tracking before transformers made it easy
A 2016 MATLAB implementation that proved siamese networks could track arbitrary objects at video speed without online model updates.

What it does
SiamFC tracks a single arbitrary object through video frames by comparing the initial target patch with candidate regions via a fully-convolutional siamese network. No online fine-tuning, no heavy per-frame adaptation — just a learned similarity function applied exhaustively across the search image. The authors report 50–100 FPS, which in 2016 was genuinely fast for deep tracking.
The interesting bit
The architecture is almost insultingly simple: two shared-weight branches (target exemplar and search image), a cross-correlation layer, and a response map that peaks where the object lives. The “fully convolutional” part means the same network handles variable search sizes without restructuring — a neat trick that lets the search region scale with expected object motion.
Key highlights
- Reproduces the ECCV 2016 workshop paper end-to-end, training and inference both included
- Pretrained networks available; you can run tracking without touching the curation pipeline
- Training requires curating ILSVRC15 video data (6.7GB metadata file provided) or downloading their pre-built
imdb_video.mat - Built on MatConvNet, MATLAB 2015b-era tooling
- The authors themselves now point to CFNet (CVPR 2017) as the better starting point — cleaner code, slightly better results
Caveats
- MATLAB + MatConvNet stack; not PyTorch, not even close
- The authors explicitly recommend their own successor repo for new work
- Setup involves renaming
.examplefiles and hand-editing paths, a ritual from a simpler time
Verdict
Worth reading if you’re tracing the lineage of modern siamese trackers (SiamRPN, SiamFC++, etc.) or need to reproduce the 2016 baseline. Skip if you actually need to ship a tracker today — the field has moved on, and the authors agree.