A 744B-parameter model that actually gets better the longer it runs
GLM-5 and GLM-5.1 are built for long-horizon agentic tasks where most models plateau and give up.

What it does
GLM-5 is a 744B-parameter (40B active) open-weight model family from Zhipu AI, with GLM-5.1 as its newer agentic-focused sibling. Both target complex systems engineering and long-running autonomous tasks—coding, terminal operations, repo generation, and multi-step reasoning over hundreds of rounds and thousands of tool calls.
The interesting bit
Most models burn out: quick initial gains, then stagnation. GLM-5.1 is designed to do the opposite—it improves with more time, revisiting reasoning and revising strategy through iteration. The README also highlights an unusual benchmark win: running a simulated vending machine business for a full year, where GLM-5 ended with $4,432 and ranked #1 among open-source models.
Key highlights
- 744B parameters with DeepSeek Sparse Attention (DSA) to cut deployment costs while keeping long-context capacity
- GLM-5.1 claims state-of-the-art on SWE-Bench Pro and strong leads on NL2Repo and Terminal-Bench 2.0
- GLM-5 ranks #1 on Vending Bench 2 for long-term operational planning among open-source models
- Custom async RL infrastructure called “slime” for more efficient large-scale post-training
- Available in BF16 and FP8, with local deployment guides for vLLM, SGLang, xLLM, and Ktransformers
Caveats
- The repo is primarily model weights and deployment recipes, not training code or data
- GLM-5.1’s chat interface is “coming in the coming days” per the README, so availability is slightly ahead of the docs
- A vLLM bug note warns of tool-call parsing issues with speculative decoding enabled; fix requires vLLM main branch
Verdict
Worth a look if you’re running autonomous coding agents or long-horizon tasks and want an open-weight alternative to Claude Opus 4.5. Skip if you need small, fast models or training reproducibility—this is inference-ready weights, not a research framework.