Teaching a robot to find cheese without falling off cliffs
A hands-on Python comparison of Q-learning and Sarsa for grid-world path planning, complete with commented code and experimental charts.

What it does This repo implements classic tabular reinforcement learning—Q-learning and Sarsa—for a mobile robot navigating grid environments with obstacles. The agent learns a Q-table (state-action value matrix) through trial and error, eventually finding shortest paths from start to goal. Three environments of increasing complexity are provided, from simple mazes to “super complex” obstacle fields.
The interesting bit
The project is structured as a teaching tool: each experiment splits cleanly into env.py (world building), agent_brain.py (algorithm logic), and run_agent.py (execution). The README walks through actual Q-table values and specific action sequences—like “down-right-down-down-down-right…"—so you can trace exactly how the robot’s policy crystallized. A direct Q-learning vs. Sarsa comparison chart is included, grounded in the same environments.
Key highlights
- Pure Python implementations with heavy commenting; no deep learning frameworks required
- Three environments scaling from basic to dense obstacles
- Explicit Q-table inspection: see learned values and derived action sequences
- Side-by-side experimental results comparing Q-learning and Sarsa convergence
- Published academic backing (ICIEAM 2019) with DOI and Zenodo archive
Caveats
- Code appears to be monolithic scripts rather than a reusable package; you’ll likely copy and adapt
- The “super complex” environment is still a discrete grid world—don’t expect continuous robotics
- Some README figure captions have copy-paste errors (Environment-2’s Q-table is labeled “environment-1”)
Verdict Good for students or researchers who want to see Q-learning and Sarsa mechanics laid bare in working Python. Skip if you need production RL infrastructure or continuous-state robotics; this is textbook tabular RL, not a framework.