google-deepmind/tapnet
DeepMind's computer vision system for tracking arbitrary points across video frames using deep learning models.

Tracking Any Point (TAP) is a computer vision system that identifies and follows points through video sequences. The repository contains the TAPIR model, a two-stage algorithm using matching and refinement stages to locate point trajectories, along with the TAP-Vid and TAPVid-3D benchmark datasets for evaluating tracking performance. It also includes RoboTAP, which applies point tracking to real-world robotics manipulation tasks through imitation learning.