← all repositories
yrlu/irl-imitation

Teaching robots by watching: three ways to guess the reward

A clean reference implementation of classic inverse reinforcement learning algorithms, frozen in 2017 dependencies.

676 stars Python ML FrameworksAgents
irl-imitation
Velocity · 7d
+0.2
★ / day
Trend
steady
star history

What it does

This repo implements three foundational IRL algorithms that infer an agent’s reward function from observed behavior. You get linear IRL (Ng & Russell 2000), maximum entropy IRL (Ziebart et al. 2008), and a deep neural variant (Wulfmeier et al. 2015), all running on toy 1D/2D gridworlds with value iteration as the MDP solver. Run demo.py and watch the reward maps reconstruct.

The interesting bit

The deep MaxEnt implementation doesn’t follow the paper exactly — the author tweaked it with ELU activations, gradient clipping, and L2 regularization, which is the kind of honest footnote you rarely see. The maxent implementation also credits Matthew Alger’s prior work openly rather than pretending to have reinvented the wheel.

Key highlights

  • Three algorithms spanning 15 years of IRL research in one codebase
  • Visual reward-map outputs make the abstract concrete
  • Includes DOI and proper BibTeX for academic use
  • demo.py actually runs out of the box (a low bar, but many repos fail it)

Caveats

  • Python 2.7 and TensorFlow 0.12.1 — this is archaeological software at this point
  • Only gridworlds; no continuous control or Atari hooks
  • Deep MaxEnt is explicitly “not exactly the model proposed in the paper”

Verdict

Grab this if you need to understand or teach the classics, or want a baseline to beat. Skip it if you need something production-ready; the dependency stack alone will eat your afternoon.

heatdrop uses Google Analytics to see which pages get read — nothing else. Your call. How we handle data.