foldl/chatllm.cpp
A pure C++ implementation for running large language model inference locally on CPU and GPU.

Velocity · 7d
+1.0
★ / day
Trend
→steady
star history
ChatLLM.cpp is a C++ inference engine built on the ggml library that enables real-time chat with LLMs ranging from under 1B to over 300B parameters. It supports multimodal inputs and retrieval-augmented generation, running entirely locally on consumer hardware via CPU or GPU acceleration. The project delivers competitive or superior accuracy compared to other implementations.