Is picolm open source?

Yes — RightNow-AI/picolm is open source, released under the MIT license.

What language is picolm written in?

RightNow-AI/picolm is primarily written in C.

How popular is picolm?

RightNow-AI/picolm has 1.7k stars on GitHub.

Where can I find picolm?

RightNow-AI/picolm is on GitHub at https://github.com/RightNow-AI/picolm.

← all repositories

RightNow-AI/picolm

An 80KB binary that runs a 1B-parameter LLM on a $10 board

PicoLM is a from-scratch C inference engine that squeezes a 1B-parameter LLM into 45MB of RAM to run offline on a $10 board.

★1.7k stars C Inference · Serving Language Models

View on GitHub ↗ Homepage ↗

Not currently ranked — collecting fresh signals.

star history

What it does picolm is a ~2,500-line C11 inference engine that runs TinyLlama 1.1B and other LLaMA-architecture models in GGUF format on hardware most frameworks ignore: Raspberry Pi Zero 2W, Sipeed LicheeRV Nano, Pi 3/4/5, and ordinary x86-64 boxes. The compiled binary weighs roughly 80KB and the runtime working set is about 45MB, because the 638MB model stays on disk and is memory-mapped layer by layer rather than loaded wholesale.

The interesting bit The project treats severe memory constraints as a first-class design target rather than an afterthought. It uses FP16 KV caches, flash attention with online softmax, pre-computed RoPE lookup tables, and fused dequantize-and-dot-product kernels to avoid allocating intermediate buffers. ARM NEON and x86 SSE2 paths are auto-detected. It even guarantees valid JSON output via grammar-constrained sampling so a 1B model can reliably trigger tools in the companion picoclaw agent framework.

Key highlights

~80KB single binary, zero external dependencies beyond libc/libm/libpthread
Memory-mapped GGUF streaming keeps the 638MB model on disk; only one layer pages into RAM at a time
Supports Q2_K through F32 quantization natively, plus --json grammar mode for structured tool calling
KV cache persistence lets repeated prompts skip prefill overhead
Cross-platform: Linux, Windows, macOS; ARM, x86-64, and RISC-V

Caveats

The performance table in the README is truncated, so exact x86-64 throughput figures are missing from the sources.
It is explicitly built around LLaMA-architecture GGUF models; don’t expect to drop in arbitrary Transformer checkpoints.

Verdict Embedded developers and privacy-paranoid tinkerers who want a fully offline assistant on a $10 RISC-V or ARM board should take a look. Anyone expecting GPT-4 quality from a 1B model running at one token per second will be disappointed.

Frequently asked

What is RightNow-AI/picolm?: PicoLM is a from-scratch C inference engine that squeezes a 1B-parameter LLM into 45MB of RAM to run offline on a $10 board.
Is picolm open source?: Yes — RightNow-AI/picolm is open source, released under the MIT license.
What language is picolm written in?: RightNow-AI/picolm is primarily written in C.
How popular is picolm?: RightNow-AI/picolm has 1.7k stars on GitHub.
Where can I find picolm?: RightNow-AI/picolm is on GitHub at https://github.com/RightNow-AI/picolm.