PPO at scale: when Python's GIL is the enemy
A 2017 Google Research project that batched OpenAI Gym environments into TensorFlow graphs to dodge Python's parallelism bottleneck.

What it does
Batch-ppo is infrastructure for running reinforcement learning experiments with many environments in parallel. It wraps OpenAI Gym envs in external processes, batches their step/reset calls, and embeds the whole thing inside a TensorFlow graph so your PPO (or future algorithm) runs with fewer session round-trips. The repo ships with a working PPO implementation and pre-made configs for tasks like Pendulum.
The interesting bit
The real craft is in InGraphBatchEnv and simulate(): instead of shuttling data between Python and TensorFlow every step, the environment state lives as variables inside the graph. The training loop becomes a single fused operation. It’s a very 2017 solution to a very 2017 problem — before TF’s eager execution made some of this dance unnecessary.
Key highlights
- Batches multiple Gym environments via
ExternalProcesswrappers that sidestep the GIL BatchEnvexposes vectorized step/reset with proper batched returnssimulate()fuses env stepping + algorithm update into one graph operation- Ships with runnable PPO, configs, and TensorBoard logging out of the box
- Requires TensorFlow 1.3+ (yes, TF 1.x era)
Caveats
- Dependencies specify Python 2/3 and TensorFlow 1.3+; this is legacy-era code
- README warns you’ll need comfort with
tf.cond,tf.scan, andtf.control_dependencies— not beginner territory - No activity or updates mentioned; likely unmaintained as a research snapshot
Verdict
Worth studying if you’re implementing batched RL from scratch or maintaining legacy TF 1.x pipelines. Skip it if you want modern PyTorch/JAX vectorized envs — the field has moved on, but the design patterns remain instructive.