← all repositories

kohjingyu/fromage

A multimodal language model that grounds frozen text LLMs to images for retrieval and generation.

fromage
Velocity · 7d
+0.4
★ / day
Trend
steady
star history

FROMAGe is a research implementation featuring a language model with visual grounding capabilities. The model uses linear projection layers and special [RET] tokens to bridge frozen pretrained LLMs with image embeddings, enabling image retrieval and multimodal dialogue. Precomputed visual embeddings on Conceptual Captions images support efficient retrieval-augmented generation. Model checkpoints are small (around 11MB) and included in the repository.

heatdrop uses Google Analytics to see which pages get read — nothing else. Your call. How we handle data.