← all repositories
RobotLocomotion/pytorch-dense-correspondence

Teaching robots to see objects as dense point clouds they can grab

A PyTorch implementation that learns visual object descriptors from self-supervision, no labels required, then uses them for manipulation.

pytorch-dense-correspondence
Velocity · 7d
+0.2
★ / day
Trend
steady
star history

What it does

This repo trains neural networks to output dense visual descriptors for objects — essentially a learned coordinate system draped over any surface, rigid or squishy. Given multiple views of the same object, the network figures out which pixels correspond to the same physical point, all without human labels. These descriptors then guide robotic grasping: pick the nose of a stuffed caterpillar, or the tongue of a shoe, even when the object is bent or viewed from a new angle.

The interesting bit

The training signal is pure self-supervision from 3D vision: projective geometry and known camera poses let the network check its own homework. The authors claim ~20 minutes of training per novel object. You can tune the same architecture for either class-general descriptors (“any shoe tongue”) or instance-specific ones (“this exact shoe tongue”).

Key highlights

  • Reference implementation for the CoRL 2018 paper Dense Object Nets
  • Supports both rigid and deformable objects; demonstrated on caterpillars, shoes, hats
  • Includes pre-trained models, Docker setup, and a step-by-step getting-started tutorial
  • Updated to PyTorch 1.1 / CUDA 10 (original paper code frozen at earlier release)
  • Novel application: transfer grasps across object classes using shared descriptors

Caveats

  • Requires Docker; setup is nontrivial and the docs push you toward a containerized workflow
  • The “~20 minutes” training claim is stated in the abstract but no hardware or dataset size is specified in the README
  • Jupyter notebook hygiene is manually enforced (“restart and clear outputs” before commit)

Verdict

Worth a look if you’re building visual servoing or manipulation pipelines and want descriptors without labeling labor. Skip if you need a drop-in perception module — this is research code with research setup friction.

heatdrop uses Google Analytics to see which pages get read — nothing else. Your call. How we handle data.