← all repositories
google-research/batch-ppo

PPO at scale: when Python's GIL is the enemy

A 2017 Google Research project that batched OpenAI Gym environments into TensorFlow graphs to dodge Python's parallelism bottleneck.

977 stars Python AgentsML Frameworks
batch-ppo
Velocity · 7d
+0.3
★ / day
Trend
steady
star history

What it does

Batch-ppo is infrastructure for running reinforcement learning experiments with many environments in parallel. It wraps OpenAI Gym envs in external processes, batches their step/reset calls, and embeds the whole thing inside a TensorFlow graph so your PPO (or future algorithm) runs with fewer session round-trips. The repo ships with a working PPO implementation and pre-made configs for tasks like Pendulum.

The interesting bit

The real craft is in InGraphBatchEnv and simulate(): instead of shuttling data between Python and TensorFlow every step, the environment state lives as variables inside the graph. The training loop becomes a single fused operation. It’s a very 2017 solution to a very 2017 problem — before TF’s eager execution made some of this dance unnecessary.

Key highlights

  • Batches multiple Gym environments via ExternalProcess wrappers that sidestep the GIL
  • BatchEnv exposes vectorized step/reset with proper batched returns
  • simulate() fuses env stepping + algorithm update into one graph operation
  • Ships with runnable PPO, configs, and TensorBoard logging out of the box
  • Requires TensorFlow 1.3+ (yes, TF 1.x era)

Caveats

  • Dependencies specify Python 2/3 and TensorFlow 1.3+; this is legacy-era code
  • README warns you’ll need comfort with tf.cond, tf.scan, and tf.control_dependencies — not beginner territory
  • No activity or updates mentioned; likely unmaintained as a research snapshot

Verdict

Worth studying if you’re implementing batched RL from scratch or maintaining legacy TF 1.x pipelines. Skip it if you want modern PyTorch/JAX vectorized envs — the field has moved on, but the design patterns remain instructive.

heatdrop uses Google Analytics to see which pages get read — nothing else. Your call. How we handle data.