← all repositories

li-plus/chatglm.cpp

C++ inference engine for ChatGLM-6B, ChatGLM2-6B, ChatGLM3, GLM-4, and CodeGeeX2 models with int4/int8 quantization.

chatglm.cpp
Velocity · 7d
+2.7
★ / day
Trend
steady
star history

This repository provides a pure C++ implementation of ChatGLM series models and GLM-4 based on the ggml library, similar in design to llama.cpp. It enables efficient CPU and GPU inference with memory optimization through int4 and int8 quantization, supports P-Tuning v2 and LoRA fine-tuned variants, and offers streaming generation. The project includes Python bindings, web demos, and API servers for deployment.

heatdrop uses Google Analytics to see which pages get read — nothing else. Your call. How we handle data.