From HOGs to ResNet: a CBIR kitchen-sink benchmark
A modular retrieval system that pits 1990s hand-crafted features against modern CNNs on a 500-image dataset, with the numbers to show who wins.

What it does
This repo implements a classic content-based image retrieval (CBIR) pipeline: extract features from a query image, compare against a database, return the top-K matches. It ships with seven different feature extractors—color histograms, Gabor filters, DAISY, edge histograms, HOG, VGG19, and ResNet152—plus fusion and random-projection modules for dimensionality reduction. Evaluation uses mean mean average precision (MMAP) over a 25-class, 500-image toy dataset.
The interesting bit
The README doesn’t just list methods; it runs the horse race. ResNet152 clocks 0.944 MMAP at depth-10, while classic edge histograms limp in at 0.301. That’s a tidy demonstration of why deep features ate computer vision—served with enough boilerplate that you can swap in your own extractor and see where it lands.
Key highlights
- Modular feature design: each extractor lives in its own file, so plugging in a new one is straightforward
- Fusion.py lets you combine weak hand-crafted features when you don’t have GPU access
- Random projection for dimension reduction—simple, fast, and explicitly acknowledged as a response to the curse of dimensionality
- Evaluation script implements proper per-class MAP averaging, not just raw accuracy
- Includes visual retrieval results for four query categories (dress, orange, NBA jersey, snack) so you can eyeball failure modes
Caveats
- The 500-image dataset is tiny by modern standards; these numbers won’t transfer cleanly to larger corpora
- No mention of inference speed, indexing structure, or approximate nearest neighbors—scalability is unclear
- Deep feature extraction appears to use off-the-shelf pretrained networks without fine-tuning
Verdict
Good for students or researchers who need a clean, comparable baseline across feature eras. Skip it if you need production-scale retrieval; the architecture is educational glue, not a search engine.