foldl/chatllm.cpp

A pure C++ implementation for running large language model inference locally on CPU and GPU.

★894 stars C++ Inference · Serving Language Models RAG · Search

View on GitHub ↗

Velocity · 7d

+1.0

★ / day

Trend

→steady

star history

ChatLLM.cpp is a C++ inference engine built on the ggml library that enables real-time chat with LLMs ranging from under 1B to over 300B parameters. It supports multimodal inputs and retrieval-augmented generation, running entirely locally on consumer hardware via CPU or GPU acceleration. The project delivers competitive or superior accuracy compared to other implementations.