microsoft/Foundry-Local
A lightweight (~20 MB) local AI runtime and SDK stack from Microsoft for running quantized LLMs and Whisper speech-to-text entirely on-device.

Foundry Local provides native SDKs in C#, JavaScript, Python, and Rust for embedding AI capabilities directly into applications. It bundles a curated catalog of quantized, hardware-accelerated models—including chat completion models (Phi, Qwen, DeepSeek, Mistral) and Whisper for audio transcription—and manages model acquisition and inference via ONNX Runtime. The solution requires no network, no API keys, and no per-token costs.