← all repositories

NotPunchnox/rkllama

An LLM server and client for running quantized language models on Rockchip RK3588 and RK3576 devices with NPU acceleration.

rkllama
Velocity · 7d
+1.0
★ / day
Trend
steady
star history

RKLLama provides a server-client architecture to run LLM inference on Rockchip SoCs using their NPU (Neural Processing Unit). It wraps the rkllm and rknn runtime libraries to execute optimized models on hardware like Orange Pi 5 and Radxa Rock 4. The server exposes a REST API while a Python client enables interaction with the served models.

heatdrop uses Google Analytics to see which pages get read — nothing else. Your call. How we handle data.