google-research/scenic
A JAX-based library for training and evaluating attention-based vision models at scale, supporting classification, segmentation, detection, and multimodal tasks.

Scenic provides shared lightweight libraries and project templates for training large-scale vision models using JAX and Flax. It implements state-of-the-art models including vision transformers (ViViT), video models, and multimodal architectures. The library includes optimized training loops, losses, metrics, input pipelines for vision datasets, and baseline implementations for common computer vision tasks.