karpathy/neuraltalk
Python+numpy implementation of multimodal recurrent neural networks that generate textual descriptions of images.

Velocity · 7d
+1.3
★ / day
Trend
→steady
star history
This project implements neural network architectures for image captioning, combining CNNs for visual feature extraction with RNNs or LSTMs for sequential text generation. The models learn to predict sentence descriptions conditioned on input images and previous word context. It supports standard benchmark datasets including Flickr8K, Flickr30K, and MSCOCO.