← all repositories
THUDM/slime

The GLM team open-sourced their RL training loop, not just the model weights

slime is the post-training framework behind GLM-5, GLM-4.7, and a growing ecosystem of agentic RL projects — built to stay thin while handling the full messy loop.

slime
Velocity · 7d
+17
★ / day
Trend
steady
star history

What it does

slime wires Megatron-LM (training) to SGLang (rollout inference) through a shared Data Buffer, then gets out of the way. Custom data generation — multi-agent workflows, tool use, sandboxed code execution, verifier feedback — plugs in without forking the training kernel. The goal is a single loop for RL post-training that is small enough to read but has survived the full production path behind several GLM releases.

The interesting bit

Most frameworks either wrap engines in thick abstraction layers or leave you to glue them yourself. slime does neither: it passes Megatron and SGLang arguments through natively (--sglang-mem-fraction-static works directly, for example). By betting on one inference backend instead of many, it can use SGLang-specific features — PD disaggregation, delta weight sync, session-affinity routing — without flattening them to a lowest-common-denominator API.

Key highlights

  • Production validation: Used for GLM-5.1, GLM-5, GLM-4.7, GLM-4.6, GLM-4.5; also supports Qwen, DeepSeek V3/R1, and Llama 3.
  • Native engine pass-through: No wrapper reimplementation of Megatron parallelism, optimizers, or SGLang serving options.
  • Agentic-by-design: Multi-turn, search, coding agents with sandboxed tool use, and fully-async rollout for long-tail generation latencies.
  • Correctness infrastructure: Separate rollout-only and train-only debug paths, plus documented reproducibility, fault tolerance, tracing, and CI.
  • Growing ecosystem: Relax (omni-modal agentic RL), P1 (physics olympiad reasoning), RLVE (400+ verifiable environments), TritonForge (GPU kernel generation).

Caveats

  • SGLang-only for rollouts; if your serving stack is vLLM or TensorRT-LLM, this is not the framework.
  • The README is confident about “lightweight” but does not list line counts or dependency footprints to verify the claim.

Verdict

Worth a close look if you are running large-scale RL post-training on SGLang and tired of maintaining your own glue. Skip it if you need multi-backend inference flexibility or are doing small-scale experimentation where simpler trainers suffice.

heatdrop uses Google Analytics to see which pages get read — nothing else. Your call. How we handle data.