An AI agent that shops for cheaper models mid-conversation
OpenSquilla routes each turn to the cheapest capable LLM, keeping persistent memory and tool use identical across CLI, Web UI, and chat channels.

What it does
OpenSquilla is a Python-based AI agent with a local model router called SquillaRouter. For every turn, it picks the cheapest model that can handle the request from a pluggable provider list (OpenRouter, OpenAI, Anthropic, Ollama, DeepSeek, Gemini, Qwen, and 20+ others). Persistent memory, a layered sandbox, built-in web search, and on-device embeddings all feed into a single shared turn loop that behaves the same whether you hit it via CLI, Web UI, or a chat channel like Slack or Telegram.
The interesting bit
The “token-efficient” pitch is essentially dynamic model arbitrage done locally — an ONNX-based router decides where to send each turn without round-tripping to a cloud judge. The README is unusually thorough about Windows portable installs, Git LFS asset pulls, and uv environment isolation, which suggests the authors have actually watched real users struggle with Python packaging.
Key highlights
- SquillaRouter runs on-device (ONNX Runtime + LightGBM) to select models per-turn
- One shared turn loop across Web UI, CLI, and chat channels (Feishu, Telegram, DingTalk, QQ, WeCom, Slack, Discord, Matrix)
- 20+ LLM providers via a pluggable layer with no config schema changes
- On-device embeddings and persistent memory included
- Windows portable zip bundles CPython; no system Python needed
- Apache 2.0 license
Caveats
- The README contains conflicting version numbers: one section claims 0.3.1 is current, another says 0.2.1
- Windows portable builds are unsigned and require administrator launch; SmartScreen will complain
- SquillaRouter needs the Visual C++ runtime on Windows, which the quick terminal install path does not auto-install
- Git LFS required for source installs to pull router model assets
Verdict
Worth a look if you’re running multi-model agent workloads and your API bill matters. Skip it if you want a single-model, single-provider setup — the routing complexity is overhead you don’t need.