RWKV/rwkv.cpp

A C library and Python wrapper for running quantized RWKV language models on CPU using the ggml framework.

★1.6k stars C++ Inference · Serving Language Models

View on GitHub ↗

Velocity · 7d

+1.3

★ / day

Trend

→steady

star history

rwkv.cpp ports the RWKV language model architecture to the ggml ML library, enabling efficient CPU-based inference. It supports multiple quantization formats including INT4, INT5, INT8 alongside FP16 and FP32. The project provides both a C library and Python wrapper, focuses on CPU execution with optional cuBLAS support, and enables LoRA checkpoint merging for model customization.