baaivision/Emu
A series of multimodal foundation models that generate images and text through next-token prediction, developed by BAAI.

Velocity · 7d
+1.7
★ / day
Trend
→steady
star history
Emu is a collection of generative multimodal models including Emu1 (generative pretraining in multimodality), Emu2 (in-context learners), and Emu3 (next-token prediction approach). These models function as multimodal generalists capable of understanding and generating both text and images in multimodal contexts. The project provides inference code, models, and demos for research and community use.