← all repositories

facebookresearch/perception_models

Meta AI releases Perception Encoder and Perception Language Model for multimodal image, video, and audio understanding.

2.3k stars Jupyter Notebook Image · Video · AudioLanguage Models
perception_models
Velocity · 7d
+5.5
★ / day
Trend
steady
star history

This repository hosts Perception Encoder (PE) for encoding images, video, and audio into embeddings, and Perception Language Model (PLM) for multimodal decoding and generation. The models achieve state-of-the-art results on perception benchmarks and are integrated into popular libraries including Hugging Face transformers and timm. Released checkpoints include multiple model sizes and variants tuned for different use cases.

heatdrop uses Google Analytics to see which pages get read — nothing else. Your call. How we handle data.