← all repositories

thu-pacman/chitu

A high-performance inference framework for serving large language models across diverse hardware including NVIDIA GPUs and domestic Chinese chips.

chitu
Velocity · 7d
+6.6
★ / day
Trend
steady
star history

Chitu (赤兔) is a production-oriented LLM inference engine developed to address enterprise AI deployment needs from small-scale experimentation to large-scale production. It provides efficient operators for FP4/FP8 online quantization, supports heterogeneous CPU+GPU mixed inference, and scales from single GPU to multi-node cluster deployments. The framework supports popular models including DeepSeek-R1, Qwen, GLM, and Kimi.

heatdrop uses Google Analytics to see which pages get read — nothing else. Your call. How we handle data.