facebookresearch/perception_models
Meta AI releases Perception Encoder and Perception Language Model for multimodal image, video, and audio understanding.

Velocity · 7d
+5.5
★ / day
Trend
→steady
star history
This repository hosts Perception Encoder (PE) for encoding images, video, and audio into embeddings, and Perception Language Model (PLM) for multimodal decoding and generation. The models achieve state-of-the-art results on perception benchmarks and are integrated into popular libraries including Hugging Face transformers and timm. Released checkpoints include multiple model sizes and variants tuned for different use cases.