Is vision_transformer open source?

Yes — google-research/vision_transformer is open source, released under the Apache-2.0 license.

What language is vision_transformer written in?

google-research/vision_transformer is primarily written in Jupyter Notebook.

How popular is vision_transformer?

google-research/vision_transformer has 12.6k stars on GitHub.

Where can I find vision_transformer?

google-research/vision_transformer is on GitHub at https://github.com/google-research/vision_transformer.

← all repositories

google-research/vision_transformer

Google's original Vision Transformer code, a zoo of 50k checkpoints

It exists to hand you the original authors' Vision Transformer and MLP-Mixer weights, plus the JAX/Flax code to fine-tune them on your own data.

★12.6k stars Jupyter Notebook Computer Vision ML Frameworks

View on GitHub ↗

Not currently ranked — collecting fresh signals.

star history

What it does

This is the official release of the Vision Transformer (ViT) and MLP-Mixer architectures from Google Research. It provides pre-trained model checkpoints—over 50,000 of them from the “How to train your ViT?” study—along with JAX/Flax code to fine-tune them on new datasets. The models were trained on ImageNet and ImageNet-21k, and range from tiny 37 MiB variants up to large architectures.

The interesting bit

Instead of treating images as pixel grids for convolution, ViT slices them into fixed-size patches, embeds each patch, and feeds the sequence to a standard Transformer encoder. The repository also includes MLP-Mixer, an all-MLP alternative that skips attention entirely. Both are available as public GCS buckets full of .npz weights.

Key highlights

Reference implementations for six published papers, including ViT, MLP-Mixer, LiT, and Sharpness-Aware Training variants.
Over 50,000 pre-trained checkpoints with systematic augmentation and regularization (AugReg), filterable via a dedicated Colab.
Checkpoints loadable in both JAX/Flax and the PyTorch timm ecosystem.
Fine-tuning supports GPUs and TPUs, automatically distributing across all available accelerators.
Published accuracy and throughput numbers for recommended models (e.g., AugReg L/16 reaches 85.59% on ImageNet at 384 resolution).

Caveats

The repository is geared toward fine-tuning, not full pre-training; original training scripts live in the separate big_vision codebase.
Colab fine-tuning is limited to a single Tesla T4 GPU or slow-network TPUs, so the authors recommend a dedicated machine for serious workloads.
Custom datasets require manual edits to vit_jax/input_pipeline.py beyond standard TensorFlow Datasets integration.

Verdict

Researchers who need the original authors’ ViT baselines in JAX should start here. If you want PyTorch-native training or need to pre-train from scratch, the big_vision repo is a better fit.

Frequently asked

What is google-research/vision_transformer?: It exists to hand you the original authors' Vision Transformer and MLP-Mixer weights, plus the JAX/Flax code to fine-tune them on your own data.
Is vision_transformer open source?: Yes — google-research/vision_transformer is open source, released under the Apache-2.0 license.
What language is vision_transformer written in?: google-research/vision_transformer is primarily written in Jupyter Notebook.
How popular is vision_transformer?: google-research/vision_transformer has 12.6k stars on GitHub.
Where can I find vision_transformer?: google-research/vision_transformer is on GitHub at https://github.com/google-research/vision_transformer.