Is Open-AutoGLM open source?

Yes — zai-org/Open-AutoGLM is open source, released under the Apache-2.0 license.

What language is Open-AutoGLM written in?

zai-org/Open-AutoGLM is primarily written in Python.

How popular is Open-AutoGLM?

zai-org/Open-AutoGLM has 25.8k stars on GitHub and is currently cooling off.

Where can I find Open-AutoGLM?

zai-org/Open-AutoGLM is on GitHub at https://github.com/zai-org/Open-AutoGLM.

← all repositories

zai-org/Open-AutoGLM

Your phone, but it listens to natural language

An open-source framework that lets a vision-language model see your screen and control your Android, HarmonyOS, or iOS device via ADB.

★25.8k stars Python Agents

View on GitHub ↗ Homepage ↗

Velocity · 7d

+7.7

★ / day

Trend

↘cooling

star history

What it does

Open-AutoGLM is a phone agent framework from Zhipu AI. You type a command like “open Xiaohongshu and search for food” in Chinese or English; a 9B vision-language model looks at your screen through ADB (or HDC for HarmonyOS, WebDriverAgent for iOS), plans the next tap or swipe, and executes it. The model runs either via third-party APIs (BigModel, ModelScope) or self-hosted through vLLM/SGLang.

The interesting bit

The model doesn’t just OCR your screen—it reasons about UI layout visually. The README shows a chain-of-thought example where the agent compares shampoo prices across JD.com and Taobao, launching apps, searching, and deciding where to buy. There’s also a human-handoff mechanism for logins and CAPTCHAs, which is the pragmatic admission that full autonomy still breaks at the payment wall.

Key highlights

Supports Android 7.0+, HarmonyOS NEXT, and iOS through WebDriverAgent
Two model variants: Chinese-optimized AutoGLM-Phone-9B and a multilingual version
Can self-host with vLLM or SGLang; exact launch parameters provided in docs
Integrates with Midscene.js for cross-platform UI automation workflows
Includes a deployment check script that validates model output quality (short or garbled chains mean your setup is wrong)

Caveats

Requires developer mode, USB debugging, and for Android, a separate ADB Keyboard APK installed and enabled
The README is primarily in Chinese; English documentation exists but is secondary
Self-hosting demands careful dependency management (noted transformer conflicts, specific CUDA/cuDNN versions)

Verdict

Worth a look if you’re building mobile automation, accessibility tools, or testing vision-language agents in the wild. Skip it if you need something that works out of the box without sideloading keyboards and toggling developer settings.

Frequently asked

What is zai-org/Open-AutoGLM?: An open-source framework that lets a vision-language model see your screen and control your Android, HarmonyOS, or iOS device via ADB.
Is Open-AutoGLM open source?: Yes — zai-org/Open-AutoGLM is open source, released under the Apache-2.0 license.
What language is Open-AutoGLM written in?: zai-org/Open-AutoGLM is primarily written in Python.
How popular is Open-AutoGLM?: zai-org/Open-AutoGLM has 25.8k stars on GitHub and is currently cooling off.
Where can I find Open-AutoGLM?: zai-org/Open-AutoGLM is on GitHub at https://github.com/zai-org/Open-AutoGLM.