abhshkdz/neural-vqa
A Torch implementation of a neural visual question answering model combining CNN image features with LSTM language processing.

Velocity · 7d
+0.1
★ / day
Trend
→steady
star history
This repository implements the VIS+LSTM visual question answering model from a research paper by Ren, Kiros & Zemel. It uses VGG-19 CNN to extract image features and LSTM to process questions, combining both to generate answers about images. The model is trained on MSCOCO images paired with VQA dataset question-answer pairs.