NotPunchnox/rkllama
An LLM server and client for running quantized language models on Rockchip RK3588 and RK3576 devices with NPU acceleration.

Velocity · 7d
+1.0
★ / day
Trend
→steady
star history
RKLLama provides a server-client architecture to run LLM inference on Rockchip SoCs using their NPU (Neural Processing Unit). It wraps the rkllm and rknn runtime libraries to execute optimized models on hardware like Orange Pi 5 and Radxa Rock 4. The server exposes a REST API while a Python client enables interaction with the served models.