← all repositories
denisyarats/pytorch_sac

SAC in PyTorch: clean reimplementation, fixed hyperparameters

A readable PyTorch port of Soft Actor-Critic that benchmarks against D4PG on DM Control Suite without tuning per-task.

598 stars Jupyter Notebook AgentsML Frameworks
pytorch_sac
Velocity · 7d
+0.3
★ / day
Trend
steady
star history

What it does

Implements Soft Actor-Critic (SAC), an off-policy reinforcement learning algorithm for continuous control. Trains agents on DeepMind Control Suite tasks like cheetah_run via a single CLI command, then spits out TensorBoard logs and eval videos to an exp folder.

The interesting bit

The authors ran SAC against D4PG across the full DM Control Suite using one fixed hyperparameter set — no per-task tuning, which is where RL code often hides its cheating. They plot p95 confidence intervals over 3 seeds, and the README notes D4PG’s numbers come from the original paper at 10⁸ steps (a frank disclosure, not a gotcha).

Key highlights

  • Single-command training: python train.py env=cheetah_run
  • Benchmarks on DM Control Suite with consistent hyperparameters across all tasks
  • Outputs TensorBoard logs, train/eval metrics, and episode videos automatically
  • Requires CUDA 9.2 GPU; dependencies managed via conda environment file
  • Citable implementation with a BibTeX entry provided

Caveats

  • CUDA 9.2 is dated; modern GPUs may need dependency tweaks
  • Only 3 seeds for confidence intervals — fine for a reference impl, not a paper claim

Verdict

Grab this if you need a readable, citable SAC baseline in PyTorch with sensible defaults. Skip if you need distributed training, newer CUDA, or Atari/discrete action spaces — this is DM Control Suite territory only.

heatdrop uses Google Analytics to see which pages get read — nothing else. Your call. How we handle data.