← all repositories

foldl/chatllm.cpp

A pure C++ implementation for running large language model inference locally on CPU and GPU.

chatllm.cpp
Velocity · 7d
+1.0
★ / day
Trend
steady
star history

ChatLLM.cpp is a C++ inference engine built on the ggml library that enables real-time chat with LLMs ranging from under 1B to over 300B parameters. It supports multimodal inputs and retrieval-augmented generation, running entirely locally on consumer hardware via CPU or GPU acceleration. The project delivers competitive or superior accuracy compared to other implementations.

heatdrop uses Google Analytics to see which pages get read — nothing else. Your call. How we handle data.