Is MetaTransformer open source?

Yes — invictus717/MetaTransformer is open source, released under the Apache-2.0 license.

What language is MetaTransformer written in?

invictus717/MetaTransformer is primarily written in Python.

How popular is MetaTransformer?

invictus717/MetaTransformer has 1.6k stars on GitHub.

Where can I find MetaTransformer?

invictus717/MetaTransformer is on GitHub at https://github.com/invictus717/MetaTransformer.

← all repositories

invictus717/MetaTransformer

Meta-Transformer: One Encoder for a Dozen Disparate Data Types

To test whether a single standard transformer encoder can handle text, images, point clouds, and nine other modalities without custom backbones.

★1.6k stars Python Language Models Image · Video · Audio

View on GitHub ↗ Homepage ↗

Not currently ranked — collecting fresh signals.

star history

What it does Meta-Transformer tokenizes inputs from twelve modalities—text, RGB, point clouds, audio, video, tables, graphs, time series, hyperspectral images, IMU traces, X-rays, and infrared—into flat sequences. It then feeds them through a shared transformer encoder built from standard timm ViT blocks, topping them with task-specific heads for jobs like classification or detection. The released Base and Large checkpoints were pretrained on LAION-2B image data and repurposed for the other eleven modalities.

The interesting bit The work rests on Data2Seq, a meta-tokenization scheme that turns wildly different structures—from point clouds to tabular rows—into sequences a vanilla transformer can consume. The underlying bet is that modality-specific backbones are overkill: once flattened, everything can ride the same self-attention bus.

Key highlights

Covers 12 modalities in one pipeline, including less common ones like IMU, hyperspectral, and X-ray.
Shares a single encoder (85M or 302M parameters) across all of them; only the heads and tokenizers change.
Provides downloadable pretrained weights for Base and Large variants.
Acknowledges heavy reuse of existing open-source frameworks (MMDetection, MMSegmentation, OpenPoints, Graphormer, etc.).
Has already spawned a follow-up, OneLLM, that bolts the same idea onto large language models and adds fMRI and depth maps.

Caveats

The README still advertises upcoming code drops—such as human-centric vision tasks—that have not yet appeared in the repo.
Actual benchmark numbers are absent from the text; performance claims exist only in embedded figures, so you will need to read the arXiv paper to judge empirical gains.
The repository is largely integration glue atop established MMlab and domain-specific libraries rather than a ground-up unified stack.

Verdict A useful reference if you are experimenting with shared multimodal encoders or need to embed heterogeneous sensor data into one space. If you need polished, end-to-end training pipelines for every modality, this is still a research snapshot.

Frequently asked

What is invictus717/MetaTransformer?: To test whether a single standard transformer encoder can handle text, images, point clouds, and nine other modalities without custom backbones.
Is MetaTransformer open source?: Yes — invictus717/MetaTransformer is open source, released under the Apache-2.0 license.
What language is MetaTransformer written in?: invictus717/MetaTransformer is primarily written in Python.
How popular is MetaTransformer?: invictus717/MetaTransformer has 1.6k stars on GitHub.
Where can I find MetaTransformer?: invictus717/MetaTransformer is on GitHub at https://github.com/invictus717/MetaTransformer.