Is needle open source?

Yes — cactus-compute/needle is open source, released under the MIT license.

What language is needle written in?

cactus-compute/needle is primarily written in Python.

How popular is needle?

cactus-compute/needle has 3.3k stars on GitHub and is currently cooling off.

Where can I find needle?

cactus-compute/needle is on GitHub at https://github.com/cactus-compute/needle.

← all repositories

cactus-compute/needle

26M parameters, one job: call your functions

A distilled Gemini 3.1 that fits on watches and glasses, finetunable on a laptop.

★3.3k stars Python Language Models Inference · Serving

View on GitHub ↗ Homepage ↗

Velocity · 7d

+15

★ / day

Trend

↘cooling

star history

What it does

Needle is a 26-million-parameter encoder-decoder transformer distilled from Gemini 3.1. It takes a natural-language query plus a JSON schema of available tools, and emits a structured function call. That’s it. No chat, no creative writing — just “turn off the lights” → {"name":"toggle_lights","arguments":{"state":"off"}}.

The interesting bit

The architecture itself is the experiment. The team calls it a “Simple Attention Network”: 12 encoder layers (no FFN, just self-attention) feed cross-attention into 8 decoder layers, with tied embeddings and a custom ZCRMSNorm. The bet is that most on-device AI doesn’t need a generalist model — it needs a reliable router, and 26M params is enough if you scope the task tightly.

Key highlights

Runs at 6,000 tok/s prefill and 1,200 tok/s decode on the Cactus runtime (claimed, not independently verified)
Weights and training data fully open on HuggingFace
One-command finetuning via needle playground web UI or CLI; auto-generates synthetic training data with Gemini
Beats FunctionGemma-270M, Qwen-0.6B, Granite-350M, and LFM2.5-350M on single-shot function call (per their benchmarks)
Pretrained 200B tokens in 27 hours on 16 TPU v6e; post-trained 2B tokens in 45 minutes

Caveats

The README itself warns that “small models can be finicky” and overfit badly below ~120 examples per tool
Explicitly not a conversational model — larger models “excel in conversational settings” where this one will flounder
Speed claims are tied to the Cactus runtime, not standard PyTorch or llama.cpp

Verdict

Grab this if you’re building a smartwatch, glasses, or phone assistant that needs to route voice commands to hardcoded tools and can’t afford a 1B+ model. Skip it if you need chit-chat, open-ended reasoning, or want to run inference without the Cactus stack.

Frequently asked

What is cactus-compute/needle?: A distilled Gemini 3.1 that fits on watches and glasses, finetunable on a laptop.
Is needle open source?: Yes — cactus-compute/needle is open source, released under the MIT license.
What language is needle written in?: cactus-compute/needle is primarily written in Python.
How popular is needle?: cactus-compute/needle has 3.3k stars on GitHub and is currently cooling off.
Where can I find needle?: cactus-compute/needle is on GitHub at https://github.com/cactus-compute/needle.