← all repositories
sichkar-valentyn/Reinforcement_Learning_in_Python

Teaching a robot to find cheese without falling off cliffs

A hands-on Python comparison of Q-learning and Sarsa for grid-world path planning, complete with commented code and experimental charts.

513 stars Python ML FrameworksDomain Apps
Reinforcement_Learning_in_Python
Velocity · 7d
+0.2
★ / day
Trend
steady
star history

What it does This repo implements classic tabular reinforcement learning—Q-learning and Sarsa—for a mobile robot navigating grid environments with obstacles. The agent learns a Q-table (state-action value matrix) through trial and error, eventually finding shortest paths from start to goal. Three environments of increasing complexity are provided, from simple mazes to “super complex” obstacle fields.

The interesting bit The project is structured as a teaching tool: each experiment splits cleanly into env.py (world building), agent_brain.py (algorithm logic), and run_agent.py (execution). The README walks through actual Q-table values and specific action sequences—like “down-right-down-down-down-right…"—so you can trace exactly how the robot’s policy crystallized. A direct Q-learning vs. Sarsa comparison chart is included, grounded in the same environments.

Key highlights

  • Pure Python implementations with heavy commenting; no deep learning frameworks required
  • Three environments scaling from basic to dense obstacles
  • Explicit Q-table inspection: see learned values and derived action sequences
  • Side-by-side experimental results comparing Q-learning and Sarsa convergence
  • Published academic backing (ICIEAM 2019) with DOI and Zenodo archive

Caveats

  • Code appears to be monolithic scripts rather than a reusable package; you’ll likely copy and adapt
  • The “super complex” environment is still a discrete grid world—don’t expect continuous robotics
  • Some README figure captions have copy-paste errors (Environment-2’s Q-table is labeled “environment-1”)

Verdict Good for students or researchers who want to see Q-learning and Sarsa mechanics laid bare in working Python. Skip if you need production RL infrastructure or continuous-state robotics; this is textbook tabular RL, not a framework.

heatdrop uses Google Analytics to see which pages get read — nothing else. Your call. How we handle data.