← all repositories
zai-org/ChatGLM-6B

A 6B-parameter Chinese chatbot that fits on a budget GPU

ChatGLM-6B squeezes bilingual dialogue into consumer hardware via aggressive quantization, then gives the weights away for free.

41k stars Python Language ModelsChat Assistants
ChatGLM-6B
Velocity · 7d
+35
★ / day
Trend
steady
star history

What it does

ChatGLM-6B is a 6.2-billion-parameter bilingual (Chinese/English) dialogue model built on the GLM architecture. It runs locally with as little as 6 GB of VRAM using INT4 quantization, and supports fine-tuning down to 7 GB. The weights are fully open for academic use and free for commercial use after a registration survey.

The interesting bit

The project treats hardware constraints as a first-class design problem rather than an afterthought. The README leads with a quantization matrix showing exactly how much GPU memory you need for each mode—FP16, INT8, INT4—plus a separate column for fine-tuning. That transparency is rarer than it should be in LLM land.

Key highlights

  • INT4 quantization enables inference on 6 GB VRAM and fine-tuning on 7 GB
  • Includes P-Tuning v2 for efficient parameter customization
  • Trained on ~1T tokens with supervised fine-tuning, feedback bootstrapping, and RLHF
  • Weights free for research; commercial use allowed after survey registration
  • Ecosystem of third-party ports: C++ inference (MNN, InferLLM), CPU-only running (JittorLLMs), langchain integration, VS Code plugins

Caveats

  • The model is explicitly described as small, probabilistic, and “easily misled”; the authors disclaim responsibility for misuse
  • The team notes they have built no official apps (web, iOS, Android, Windows)—everything is community-driven
  • README is now largely a billboard for newer GLM-4 models, with ChatGLM-6B itself in maintenance mode

Verdict

Worth a look if you need a permissively licensed Chinese-English chatbot that actually fits on modest hardware. Skip it if you want state-of-the-art performance or an actively developed upstream—the project has moved on to GLM-4.

heatdrop uses Google Analytics to see which pages get read — nothing else. Your call. How we handle data.