Is tapnet open source?

Yes — google-deepmind/tapnet is open source, released under the Apache-2.0 license.

What language is tapnet written in?

google-deepmind/tapnet is primarily written in Jupyter Notebook.

How popular is tapnet?

google-deepmind/tapnet has 1.9k stars on GitHub.

Where can I find tapnet?

google-deepmind/tapnet is on GitHub at https://github.com/google-deepmind/tapnet.

← all repositories

google-deepmind/tapnet

DeepMind’s assembly line for pixel-perfect video tracking

It bundles benchmarks, datasets, and a lineage of ever-faster trackers to solve the deceptively hard problem of following any single point through video.

★1.9k stars Jupyter Notebook Computer Vision Domain Apps

View on GitHub ↗ Homepage ↗

Not currently ranked — collecting fresh signals.

star history

What it does This is DeepMind’s official hub for Tracking Any Point (TAP), a long-term research program aimed at following any arbitrary pixel through video with class-agnostic precision. The repository hosts the TAP-Vid and TAPVid-3D benchmarks, a family of trackers from TAPIR to TAPNext++, a bootstrap training framework called BootsTAP, and a robotics extension named RoboTAP. You get pre-trained checkpoints in both JAX and PyTorch, plus Colab demos and a script for real-time live-camera tracking.

The interesting bit The project treats point tracking as an evolving stack rather than a solved utility. TAPIR uses a two-stage match-then-refine design, while TAPNext reframes tracking as next-token prediction, propagating point state forward through the network like a language model. TAPNext++ extends this with 40× longer stable tracking and explicit re-detection after occlusions, and there is even TRAJAN—a trajectory autoencoder that learns motion embeddings to evaluate generative video models without relying on visual appearance.

Key highlights

Benchmarks span 2D and 3D, with TAPVid-3D alone containing over 1 million computed ground-truth trajectories across more than 4,000 real-world videos.
BootsTAP bootstraps from unlabeled real-world footage by enforcing consistency across spatial transforms, corruptions, and alternate query points, substantially boosting tracker accuracy.
RoboTAP closes the loop to physical robots, using TAPIR tracks to perform real-world manipulation tasks via imitation learning.
The causal online demo runs at roughly 17 fps on 480×480 video on a 2018-era mobile GPU, suggesting the lightweight variants are genuinely real-time capable.
Most major models offer architecture-matched PyTorch re-implementations alongside the original JAX versions, so you are not locked into one framework.

Caveats

Training instructions in the repository are explicitly limited to TAP-Net and TAPIR on the synthetic Kubric dataset; reproducing the newer TAPNext or TAPNext++ models from scratch is not documented here.
The offline live demo relies on JAX, and the README warns that you must manually align JAX, CUDA, and CUDNN versions, so expect dependency friction if you run outside the provided Colabs.

Verdict Computer vision researchers and robotics engineers who need long-term, class-agnostic point tracks should dig in. Casual users looking for a polished, drop-in video annotation tool will likely find the paper references and benchmark metrics overwhelming.

Frequently asked

What is google-deepmind/tapnet?: It bundles benchmarks, datasets, and a lineage of ever-faster trackers to solve the deceptively hard problem of following any single point through video.
Is tapnet open source?: Yes — google-deepmind/tapnet is open source, released under the Apache-2.0 license.
What language is tapnet written in?: google-deepmind/tapnet is primarily written in Jupyter Notebook.
How popular is tapnet?: google-deepmind/tapnet has 1.9k stars on GitHub.
Where can I find tapnet?: google-deepmind/tapnet is on GitHub at https://github.com/google-deepmind/tapnet.