← all repositories

invictus717/MetaTransformer

A single transformer architecture that processes 12 modalities including text, images, audio, video, point clouds, and graphs without modality-specific modifications.

MetaTransformer
Velocity · 7d
+1.6
★ / day
Trend
steady
star history

Meta-Transformer is a unified multimodal learning framework that uses a single frozen transformer encoder to handle diverse data modalities. The approach maps different modality inputs into a shared token space and processes them through a standard transformer backbone without any modality-specific modifications. It was published at ICCV 2023 and has received significant citations, demonstrating broad research community interest in unified multimodal architectures.

heatdrop uses Google Analytics to see which pages get read — nothing else. Your call. How we handle data.