lucidrains/flamingo-pytorch
PyTorch implementation of DeepMind's Flamingo visual-language model with PerceiverResampler and GatedCrossAttentionBlock components.

This repository provides a PyTorch implementation of the Flamingo model from DeepMind, a state-of-the-art few-shot visual question answering system. It includes the perceiver resampler for shrinking media sequences, specialized masked cross-attention blocks for allowing language models to attend to visual inputs, and tanh gating at the ends of cross-attention and feedforward blocks. The implementation enables building multimodal language models that can process interleaved text and images for few-shot learning.