Is chitu open source?

Yes — thu-pacman/chitu is open source, released under the Apache-2.0 license.

What language is chitu written in?

thu-pacman/chitu is primarily written in Python.

How popular is chitu?

thu-pacman/chitu has 3.1k stars on GitHub.

Where can I find chitu?

thu-pacman/chitu is on GitHub at https://github.com/thu-pacman/chitu.

← all repositories

thu-pacman/chitu

Production LLM inference that treats all GPUs as first-class

Chitu is a production-grade inference engine built to run DeepSeek, Qwen, and friends on everything from a single CPU to clusters of NVIDIA, Ascend, or Moore Threads silicon.

★3.1k stars Python Inference · Serving Language Models

View on GitHub ↗ Homepage ↗

Not currently ranked — collecting fresh signals.

star history

What it does

Chitu is a production-oriented LLM serving framework that scales from pure-CPU deployments to large GPU clusters. It runs models like DeepSeek-R1, Qwen3, GLM-4.5, and Kimi, and ships custom operators for on-the-fly quantization—converting FP4 to FP8 or BF16—so you can squeeze a 671B-parameter model onto a single card with CPU+GPU heterogeneous inference. The team targets stability under concurrent traffic rather than just benchmark wins.

The interesting bit

While most open-source inference stacks optimize for NVIDIA first and treat other hardware as an afterthought, Chitu builds Ascend 910B, Moore Threads, Muxi, and Hygon support into the same release cycle. The project is refreshingly candid about its limitations: it openly admits team bandwidth is tight and has already dropped Ascend A3 images because they lack the hardware to test them.

Key highlights

Runs on NVIDIA, Huawei Ascend, Moore Threads, Muxi, and Hygon silicon from a single GPU up to cluster scale.
Supports CPU+GPU heterogeneous inference for oversized models like DeepSeek-R1 671B.
Implements custom operators for online quantization conversion (FP4→FP8/BF16, FP8→BF16).
Explicitly positioned for long-term production use with concurrent business traffic, not just experimentation.
Reuses battle-tested primitives from vLLM, SGLang, FlashAttention, and llama.cpp rather than reinventing the wheel.

Caveats

The team warns that limited manpower means user issues may not be resolved quickly; commercial support is available via email.
Ascend A3 platform images are no longer maintained as of v0.5.5 because the team lacks the hardware to validate them.
Published performance numbers are explicitly noted to fluctuate across hardware, software versions, and workloads.

Verdict

Worth evaluating if you are deploying LLMs in China or on non-NVIDIA hardware and need a framework that treats those chips as primary citizens. Pass if you are looking for a mature, community-saturated ecosystem with guaranteed free support.

Frequently asked

What is thu-pacman/chitu?: Chitu is a production-grade inference engine built to run DeepSeek, Qwen, and friends on everything from a single CPU to clusters of NVIDIA, Ascend, or Moore Threads silicon.
Is chitu open source?: Yes — thu-pacman/chitu is open source, released under the Apache-2.0 license.
What language is chitu written in?: thu-pacman/chitu is primarily written in Python.
How popular is chitu?: thu-pacman/chitu has 3.1k stars on GitHub.
Where can I find chitu?: thu-pacman/chitu is on GitHub at https://github.com/thu-pacman/chitu.