Is GLM-OCR open source?

Yes — zai-org/GLM-OCR is open source, released under the Apache-2.0 license.

What language is GLM-OCR written in?

zai-org/GLM-OCR is primarily written in Python.

How popular is GLM-OCR?

zai-org/GLM-OCR has 7k stars on GitHub.

Where can I find GLM-OCR?

zai-org/GLM-OCR is on GitHub at https://github.com/zai-org/GLM-OCR.

← all repositories

zai-org/GLM-OCR

Document OCR crammed into a 0.9B vision-language model

GLM-OCR exists to squeeze enterprise-grade document understanding—tables, formulas, code, seals—into a sub-1B model you can host yourself or hit via API.

★7k stars Python Computer Vision Language Models Inference · Serving

View on GitHub ↗

Not currently ranked — collecting fresh signals.

star history

What it does

GLM-OCR is a vision-language model that performs document OCR and layout understanding. It runs a two-stage pipeline: first detecting regions like tables and formulas with PP-DocLayout-V3, then recognizing text in parallel through a 0.9B-parameter encoder-decoder. The output is structured Markdown and JSON, accessible either through Zhipu’s hosted cloud API or by self-hosting the model via vLLM, SGLang, or Ollama.

The interesting bit

The model claims top rank on OmniDocBench V1.5 (94.62) despite being small enough to run on modest hardware. It uses Multi-Token Prediction loss and full-task reinforcement learning during training, paired with a CogViT visual encoder and a GLM-0.5B text decoder—an unusually lean architecture for document understanding.

Key highlights

Claims #1 overall on OmniDocBench V1.5 and strong results on formula, table, and information-extraction benchmarks.
Weighs only 0.9B parameters and supports speculative decoding via vLLM and SGLang for faster inference.
Offers a modular SDK with swappable components: PageLoader, OCRClient, PPDocLayoutDetector, and ResultFormatter.
Can run entirely through a hosted API or fully offline via local inference engines.
Code is Apache 2.0, while the model weights are MIT licensed (the pipeline also pulls in PP-DocLayoutV3 under Apache 2.0).

Caveats

Large images and PDFs require manual memory and context-length tuning for self-hosted deployments.
The frictionless path relies on Zhipu’s commercial MaaS API; fully local setups need GPU resources and extra configuration.
BF16 is the only published precision option in the model download table.

Verdict

Worth a look if you need production-grade document parsing on a budget or offline. Skip it if you are looking for a generic, lightweight text-only OCR tool without layout awareness.

Frequently asked

What is zai-org/GLM-OCR?: GLM-OCR exists to squeeze enterprise-grade document understanding—tables, formulas, code, seals—into a sub-1B model you can host yourself or hit via API.
Is GLM-OCR open source?: Yes — zai-org/GLM-OCR is open source, released under the Apache-2.0 license.
What language is GLM-OCR written in?: zai-org/GLM-OCR is primarily written in Python.
How popular is GLM-OCR?: zai-org/GLM-OCR has 7k stars on GitHub.
Where can I find GLM-OCR?: zai-org/GLM-OCR is on GitHub at https://github.com/zai-org/GLM-OCR.