← all repositories

cactus-compute/cactus

An on-device AI inference engine optimized for mobile and wearable devices, supporting LLMs, speech recognition, and vision with ARM SIMD kernels and quantization.

cactus
Velocity · 7d
+13
★ / day
Trend
steady
star history

Cactus provides fast, low-RAM AI inference on ARM CPUs through zero-copy memory mapping and custom SIMD kernels for Apple, Snapdragon, and Exynos chips. It supports multimodal workloads including chat, vision, speech-to-text, and RAG through an OpenAI-compatible API layer. The engine includes NPU-accelerated prefill, KV-cache quantization, and chunked prefill to minimize latency and power consumption on mobile devices.

heatdrop uses Google Analytics to see which pages get read — nothing else. Your call. How we handle data.