← all repositories

apple/ml-4m

A training framework for any-to-any multimodal foundation models supporting dozens of vision modalities and tasks.

ml-4m
Velocity · 7d
+2.3
★ / day
Trend
steady
star history

4M is a research framework for training foundation models that handle arbitrary input-output modality combinations using masked token modeling and unified tokenization. The released 4M-7 and 4M-21 models perform diverse vision tasks including generation, detection, segmentation, and transfer to unseen tasks and modalities. Code, pretrained weights, and training infrastructure are open-sourced for reproducibility.

heatdrop uses Google Analytics to see which pages get read — nothing else. Your call. How we handle data.