kohjingyu/gill
A multimodal LLM that processes interleaved image-and-text inputs to generate text, retrieve images, and synthesize images.

Velocity · 7d
+0.4
★ / day
Trend
→steady
star history
GILL (Generating Images with Large Language Models) is a NeurIPS 2023 research project that extends an LLM with vision capabilities. It enables the model to process arbitrarily interleaved image-and-text inputs and produce outputs including text responses, retrieved images from a large collection, and newly generated images. The model bridges large language models with image generation and retrieval using learned projection layers.