← all repositories

peteanderson80/bottom-up-attention

A bottom-up attention model based on Faster R-CNN with ResNet-101 that extracts salient image region features for visual question answering and image captioning.

1.5k stars Jupyter Notebook Computer Vision
bottom-up-attention
Velocity · 7d
+0.4
★ / day
Trend
steady
star history

This repository provides code for training a bottom-up attention model using multi-GPU Faster R-CNN with ResNet-101 backbone, trained on Visual Genome object and attribute annotations. The pretrained model generates spatial features for salient image regions that can replace traditional CNN features in attention-based image captioning and VQA systems. The approach achieved state-of-the-art performance on MSCOCO captioning (CIDEr 117.9, BLEU_4 36.9) and won the 2017 VQA Challenge with 70.3% overall accuracy.

heatdrop uses Google Analytics to see which pages get read — nothing else. Your call. How we handle data.