← all repositories

google-research/pix2seq

Google Research implementation of Pix2Seq, a TensorFlow 2 framework that converts RGB images into semantic sequences for object detection and other vision tasks.

pix2seq
Velocity · 7d
+0.6
★ / day
Trend
steady
star history

Pix2Seq is a multi-task computer vision framework that reformulates tasks like object detection as sequence generation problems. The codebase supports both autoregressive decoding and diffusion-based generative models (including FitTransformer, Bit Diffusion, and RIN). Built in TensorFlow 2 with efficient TPU/GPU support, it provides pretrained checkpoints for tasks like object detection on datasets like Objects365 and provides Colab notebooks for inference.

heatdrop uses Google Analytics to see which pages get read — nothing else. Your call. How we handle data.