A Swiss Army knife that actually runs on your GPU
One Python project bundles local LLMs, image generation, voice chat, and workflow automation behind a Qt GUI or a headless API.

What it does AI Runner is a desktop app and headless server for running generative AI entirely offline. It handles LLM chat, Stable Diffusion/FLUX image generation, text-to-speech, speech-to-text, and even real-time voice conversations with LLMs. Everything is bundled behind either a PySide6 GUI or an HTTP API on port 8080.
The interesting bit
The project doesn’t just pile features together—it speaks multiple API dialects. Run airunner-headless --ollama-mode and it mimics Ollama on port 11434, letting VS Code’s Continue extension treat it like a local copilot. It also exposes OpenAI-compatible /v1/chat/completions endpoints, so existing tools can plug in without knowing the backend is a quantized GGUF model on your RTX 3060.
Key highlights
- Drag-and-drop LangGraph workflow builder with runtime execution
- Supports SD 1.5, SDXL, FLUX.1, Llama 3.1, Whisper, and OpenVoice out of the box
- CLI model management for HuggingFace and CivitAI (
airunner-hf-download,airunner-civitai-download) - Auto-generates HTTPS certificates; Docker compose for both GUI and headless modes
--no-preloaddefers model loading until first request, saving VRAM when running multiple services
Caveats
- System requirements are steep: minimum RTX 3060, 16 GB RAM, and 22–100+ GB storage
- Non-English language support is patchy—Japanese TTS and LLM work, but STT and GUI are English-only for most languages
- Manual installation on Ubuntu is involved (CUDA toolkit, espeak, Qt6 Wayland plugins, mecab, and compiling llama-cpp-python from source)
Verdict Worth a look if you want a single self-hosted box that replaces ComfyUI + Ollama + a TTS pipeline, and you have the GPU to feed it. Skip it if you’re hoping for lightweight CPU inference or polished cross-platform support beyond Ubuntu and Windows.