← all repositories

aimagelab/meshed-memory-transformer

A transformer architecture with memory augmentation for generating textual descriptions from images, published at CVPR 2020.

545 stars Python Computer VisionML Frameworks
meshed-memory-transformer
Velocity · 7d
+0.2
★ / day
Trend
steady
star history

The Meshed-Memory Transformer (M2) is a deep learning model that generates captions for images using a modified transformer architecture. It introduces memory layers to enhance the model’s capacity for learning visual-semantic relationships. The model operates on pre-extracted detection features from a vision backbone and produces natural language descriptions through attention-based decoding. It was trained and evaluated on the COCO dataset.

heatdrop uses Google Analytics to see which pages get read — nothing else. Your call. How we handle data.