← all repositories
zai-org/GLM-5

A 744B-parameter model that actually gets better the longer it runs

GLM-5 and GLM-5.1 are built for long-horizon agentic tasks where most models plateau and give up.

GLM-5
Velocity · 7d
+28
★ / day
Trend
steady
star history

What it does

GLM-5 is a 744B-parameter (40B active) open-weight model family from Zhipu AI, with GLM-5.1 as its newer agentic-focused sibling. Both target complex systems engineering and long-running autonomous tasks—coding, terminal operations, repo generation, and multi-step reasoning over hundreds of rounds and thousands of tool calls.

The interesting bit

Most models burn out: quick initial gains, then stagnation. GLM-5.1 is designed to do the opposite—it improves with more time, revisiting reasoning and revising strategy through iteration. The README also highlights an unusual benchmark win: running a simulated vending machine business for a full year, where GLM-5 ended with $4,432 and ranked #1 among open-source models.

Key highlights

  • 744B parameters with DeepSeek Sparse Attention (DSA) to cut deployment costs while keeping long-context capacity
  • GLM-5.1 claims state-of-the-art on SWE-Bench Pro and strong leads on NL2Repo and Terminal-Bench 2.0
  • GLM-5 ranks #1 on Vending Bench 2 for long-term operational planning among open-source models
  • Custom async RL infrastructure called “slime” for more efficient large-scale post-training
  • Available in BF16 and FP8, with local deployment guides for vLLM, SGLang, xLLM, and Ktransformers

Caveats

  • The repo is primarily model weights and deployment recipes, not training code or data
  • GLM-5.1’s chat interface is “coming in the coming days” per the README, so availability is slightly ahead of the docs
  • A vLLM bug note warns of tool-call parsing issues with speculative decoding enabled; fix requires vLLM main branch

Verdict

Worth a look if you’re running autonomous coding agents or long-horizon tasks and want an open-weight alternative to Claude Opus 4.5. Skip if you need small, fast models or training reproducibility—this is inference-ready weights, not a research framework.

heatdrop uses Google Analytics to see which pages get read — nothing else. Your call. How we handle data.