Yes — kuanghuei/SCAN is open source, released under the Apache-2.0 license.

What language is SCAN written in?

kuanghuei/SCAN is primarily written in Python.

kuanghuei/SCAN has 579 stars on GitHub.

Where can I find SCAN?

kuanghuei/SCAN is on GitHub at https://github.com/kuanghuei/SCAN.

kuanghuei/SCAN

Teaching images and captions to pay attention to each other

A 2018 ECCV paper that makes image-text matching bidirectional by having each modality attend to the other, rather than fusing them into a single vector and hoping for the best.

★579 stars Python Computer Vision Image · Video · Audio

View on GitHub ↗

Not currently ranked — collecting fresh signals.

star history

What it does

SCAN (Stacked Cross Attention Network) learns to match images with text captions by computing fine-grained alignment between image regions and words, rather than collapsing both into a single shared embedding. It supports two directions: text-to-image attention (find the relevant image regions for each word) and image-to-text attention (find the relevant words for each image region). The code reproduces the ECCV 2018 paper from Microsoft Research, built as a fork of the VSE++ framework.

The interesting bit

The clever part is that SCAN doesn’t just do cross-attention once—it stacks it, and it uses aggregation functions (LogSumExp or simple averaging) to pool the resulting similarity scores. The README includes exact command-line flags for reproducing each variant, which is the kind of detail that saves hours of head-scratching.

Key highlights

Pre-computed bottom-up attention features for Flickr30K and MS-COCO available via Kaggle dataset
Four model variants with documented hyperparameters: t-i LSE, t-i AVG, i-t LSE, i-t AVG
Built on PyTorch 0.3 (yes, that old) with Python 2.7
Includes evaluation script with 5-fold cross-validation support for MS-COCO
Apache 2.0 licensed

Caveats

Dependencies are frozen in 2018: PyTorch 0.3 and Python 2.7 will require environment archaeology to run today
No candidate images provided for the repository

Verdict

Worth a look if you’re researching cross-modal retrieval or need a baseline for image-text matching with explicit attention mechanisms. Skip it if you need something that runs out of the box on modern PyTorch—you’ll be porting code before you get results.

Frequently asked

What is kuanghuei/SCAN?: A 2018 ECCV paper that makes image-text matching bidirectional by having each modality attend to the other, rather than fusing them into a single vector and hoping for the best.
Is SCAN open source?: Yes — kuanghuei/SCAN is open source, released under the Apache-2.0 license.
What language is SCAN written in?: kuanghuei/SCAN is primarily written in Python.
How popular is SCAN?: kuanghuei/SCAN has 579 stars on GitHub.
Where can I find SCAN?: kuanghuei/SCAN is on GitHub at https://github.com/kuanghuei/SCAN.