A museum piece from the SAC era
A clean PyTorch reimplementation of Soft Actor-Critic, now archived and gathering dust.

What it does
Implements Soft Actor-Critic (SAC), an off-policy reinforcement learning algorithm that maximizes both reward and policy entropy. The repo covers both the original 2018 stochastic-actor paper and the 2019 follow-up, plus a deterministic variant. It targets standard MuJoCo Gym environments like HalfCheetah and Humanoid.
The interesting bit
The author exposed knobs that often stay hidden: automatic entropy tuning, hard vs. soft target updates, and a deterministic policy mode. There’s even a tuned alpha per environment — the kind of detail you usually dig out of someone else’s blog post.
Key highlights
- Reproduces both Haarnoja et al. (2018) and the updated (2019) SAC formulations
- Optional deterministic policy with hard target updates, for ablation-minded researchers
- Per-environment temperature (
alpha) defaults baked in - Single-file implementation; dependencies are just PyTorch and
mujoco-py
Caveats
- Explicitly archived and unmaintained by the author
- Hard-coded to MuJoCo Gym environments; no modern Gymnasium migration
mujoco-pyitself is deprecated, so getting this running is increasingly archaeological
Verdict
Useful if you’re tracing SAC’s evolution or need a minimal reference to compare against your own implementation. Skip it if you want something production-ready or modern — the field has moved on, and so has the tooling.