← all repositories
AtomicBot-ai/Atomic-Chat

A desktop switchboard for local LLM inference engines

It turns your local machine into an OpenAI-compatible inference endpoint so agents and IDEs can run on offline models without reconfiguration.

Collecting fresh signals — velocity needs a few days of history.
collecting data…
star history

What it does

Atomic Chat is a cross-platform desktop and mobile app that runs open-weight LLMs entirely on your own hardware. It bundles three inference backends—an in-house fork of llama.cpp with TurboQuant KV-cache optimizations, upstream llama.cpp, and Apple Silicon-native MLX-VLM—behind a single OpenAI-compatible server at http://localhost:1337/v1. You can chat in the GUI, mix in cloud providers with your own keys, or point any external agent, CLI, or IDE plugin at the local endpoint and treat your laptop like a private API provider.

The interesting bit

The app acts less like a chat client and more like a local inference router. Clients talking to the unified localhost API don’t know whether the model underneath is running on CUDA, Vulkan, or Apple’s Neural Engine, and you can swap engines without touching the client configuration. That abstraction makes it a plausible drop-in backend for the growing crop of OpenAI-compatible coding agents and MCP tools.

Key highlights

  • Three swappable engines under one roof: a custom llama.cpp fork with TurboQuant quantization, upstream ggml-org/llama.cpp, and MLX-VLM for vision models on M-series chips.
  • Speculative-decoding sprawl: Multi-Token Prediction (MTP), DFlash block-diffusion, and EAGLE-3 on MLX, though several are platform-locked to macOS or Windows.
  • Native desktop builds for macOS, Windows, and Linux, plus iOS and Android apps.
  • Built-in MCP server connections, artifact previews for HTML/CSS/JS, and per-assistant system prompts.
  • Explicit integrations with agents like OpenCode, Goose, and Kilo Code, each documented to point at http://localhost:1337/v1.

Caveats

  • Several speed features are OS-gated: DFlash and EAGLE-3 are Apple Silicon only, while MTP is unavailable on Linux, so performance and feature parity vary by platform.
  • The README advertises large speedups—e.g., “up to 6× faster” with DFlash and “30–70% throughput boost” with MTP—but does not provide reproducible benchmarks or test conditions.
  • The project is a substantial integration and packaging layer (Tauri, multiple engine bindings, mobile ports) rather than a novel inference engine, so its stability tracks the upstream forks it wraps.

Verdict

Worth a look if you want a GUI-managed, fully offline LLM backend that existing OpenAI-compatible tools can consume. Skip it if you need a headless, server-first stack or want to avoid a desktop app sitting between your models and your agents.

Frequently asked

What is AtomicBot-ai/Atomic-Chat?
It turns your local machine into an OpenAI-compatible inference endpoint so agents and IDEs can run on offline models without reconfiguration.
Is Atomic-Chat open source?
Yes — AtomicBot-ai/Atomic-Chat is an open-source project tracked on heatdrop.
What language is Atomic-Chat written in?
AtomicBot-ai/Atomic-Chat is primarily written in TypeScript.
How popular is Atomic-Chat?
AtomicBot-ai/Atomic-Chat has 967 stars on GitHub.
Where can I find Atomic-Chat?
AtomicBot-ai/Atomic-Chat is on GitHub at https://github.com/AtomicBot-ai/Atomic-Chat.

heatdrop uses Google Analytics to see which pages get read — nothing else. Your call. How we handle data.