Is cactus open source?

Yes — cactus-compute/cactus is an open-source project tracked on heatdrop.

What language is cactus written in?

cactus-compute/cactus is primarily written in C++.

How popular is cactus?

cactus-compute/cactus has 5.5k stars on GitHub and is currently accelerating.

Where can I find cactus?

cactus-compute/cactus is on GitHub at https://github.com/cactus-compute/cactus.

← all repositories

cactus-compute/cactus

An AI engine for devices too small for a GPU

A local AI brain for tiny devices that knows when to call the cloud.

★5.5k stars C++ Inference · Serving Language Models

View on GitHub ↗ Homepage ↗

Velocity · 7d

+22

★ / day

Trend

↗accelerating

star history

What it does Cactus is a C++ inference stack for ARM CPUs that packs speech, vision, and language models onto phones, watches, and Raspberry Pis. It ships SDK bindings in eight languages and can automatically route requests to cloud models when local hardware is overwhelmed. The README treats RAM as the primary enemy, citing zero-copy memory mapping and custom attention kernels; benchmark tables show a 1.2B-parameter model squeezing into roughly 70 MB on Apple Silicon, while the same model needs closer to 1.5 GB on a Galaxy S25 Ultra.

The interesting bit Rather than hiding the model behind a single API, Cactus exposes three layers: an OpenAI-compatible engine, a zero-copy computation graph it bills as “PyTorch for mobile,” and hand-tuned ARM SIMD kernels. The cloud-fallback mechanism is the pragmatic twist: the engine returns a cloud_handoff flag in its response, so your application knows exactly when it stopped computing locally and started borrowing someone else’s GPU.

Key highlights

Claims fastest ARM CPU inference, using INT4 quantization, chunked prefill, and KV-cache quantization.
Multimodal out of the box: one SDK handles chat, vision, transcription, embeddings, RAG, and tool calling.
Apple NPU acceleration is already shipped; Qualcomm, MediaTek, and Exynos support is listed as coming on the roadmap.
Runs on a wide hardware spectrum, from a Mac M4 Pro to a Raspberry Pi 5, though performance and RAM use vary dramatically by device.
Pre-converted model weights are hosted on HuggingFace, though some Gemma checkpoints are gated and require tokens.

Caveats

Android NPU support is still on the roadmap; the published CPU-only benchmarks show missing latency entries for Snapdragon and Exynos devices.
Several listed transcription models—specifically Silero VAD and Pyannote variants—ship no benchmark numbers at all.
Some Gemma weights are gated behind HuggingFace authentication, adding friction to the out-of-box experience.

Verdict A solid bet for iOS or macOS developers who need offline-capable transcription or small-model chat with an automatic cloud safety net. Give it a pass if you need Android NPU inference today or a full training pipeline—this is strictly an inference runtime, and the roadmap shows broader mobile silicon support is still arriving.

Frequently asked

What is cactus-compute/cactus?: A local AI brain for tiny devices that knows when to call the cloud.
Is cactus open source?: Yes — cactus-compute/cactus is an open-source project tracked on heatdrop.
What language is cactus written in?: cactus-compute/cactus is primarily written in C++.
How popular is cactus?: cactus-compute/cactus has 5.5k stars on GitHub and is currently accelerating.
Where can I find cactus?: cactus-compute/cactus is on GitHub at https://github.com/cactus-compute/cactus.