← all repositories

lyuchenyang/Macaw-LLM

Multi-modal LLM combining vision, audio, and text processing for unified language modeling.

Macaw-LLM
Velocity · 7d
+1.4
★ / day
Trend
steady
star history

Macaw-LLM is a research project developing multi-modal language modeling capabilities by integrating images, videos, audio, and text into a unified system. The architecture leverages pre-trained components including CLIP for visual understanding, Whisper for audio processing, and LLaMA as the base language model. This enables the model to process and reason across multiple modalities within a language modeling framework.

heatdrop uses Google Analytics to see which pages get read — nothing else. Your call. How we handle data.