← all repositories
andyzeng/visual-pushing-grasping

Teaching robots to push things around so they can grab them later

A PyTorch implementation of deep RL that learns when to shove and when to grasp, directly from RGB-D camera feeds.

visual-pushing-grasping
Velocity · 7d
+0.4
★ / day
Trend
steady
star history

What it does

This is the reference implementation for a 2018 IROS paper on visual pushing and grasping. It trains two fully convolutional networks—one for pushes, one for grasps—jointly via Q-learning from RGB-D images. The system runs in V-REP/CoppeliaSim simulation or on a real UR5 arm, learning through trial and error with rewards coming only from successful grasps.

The interesting bit

The networks discover synergies from scratch: pushing to clear clutter for future grasps, grasping to set up better pushes. No hand-coded rules, no motion planning—just pixel-wise action sampling that generalizes to novel objects after a few hours of training.

Key highlights

  • Self-supervised: rewards come only from grasp success; pushing utility is learned indirectly
  • Two FCNs output dense pixel-wise maps for push and grasp candidates with orientations
  • Supports both simulation (V-REP/CoppeliaSim) and real-world UR5 deployment
  • Training from scratch works with PyTorch 1.0+; pre-trained models require legacy PyTorch 0.3
  • Includes ablation baselines: reactive policies, grasp-only, myopic discounting, no push rewards
  • GPU strongly recommended—CPU iterations take minutes versus seconds

Caveats

  • Pre-trained models are stuck on PyTorch 0.3; training fresh is the modern path
  • V-REP setup involves clicking through “Dynamics content” popups three times
  • README notes Ubuntu 16.04 testing; newer OS compatibility is unspecified

Verdict

Worth a look if you’re doing clutter manipulation research or need a reproducible RL baseline for pick-and-place. Skip it if you want drop-in industrial deployment—the simulation glue and legacy model baggage are real.

heatdrop uses Google Analytics to see which pages get read — nothing else. Your call. How we handle data.