← all repositories
opensquilla/opensquilla

An AI agent that shops for cheaper models mid-conversation

OpenSquilla routes each turn to the cheapest capable LLM, keeping persistent memory and tool use identical across CLI, Web UI, and chat channels.

3.5k stars Python AgentsInference · Serving
opensquilla
Velocity · 7d
+107
★ / day
Trend
steady
star history

What it does

OpenSquilla is a Python-based AI agent with a local model router called SquillaRouter. For every turn, it picks the cheapest model that can handle the request from a pluggable provider list (OpenRouter, OpenAI, Anthropic, Ollama, DeepSeek, Gemini, Qwen, and 20+ others). Persistent memory, a layered sandbox, built-in web search, and on-device embeddings all feed into a single shared turn loop that behaves the same whether you hit it via CLI, Web UI, or a chat channel like Slack or Telegram.

The interesting bit

The “token-efficient” pitch is essentially dynamic model arbitrage done locally — an ONNX-based router decides where to send each turn without round-tripping to a cloud judge. The README is unusually thorough about Windows portable installs, Git LFS asset pulls, and uv environment isolation, which suggests the authors have actually watched real users struggle with Python packaging.

Key highlights

  • SquillaRouter runs on-device (ONNX Runtime + LightGBM) to select models per-turn
  • One shared turn loop across Web UI, CLI, and chat channels (Feishu, Telegram, DingTalk, QQ, WeCom, Slack, Discord, Matrix)
  • 20+ LLM providers via a pluggable layer with no config schema changes
  • On-device embeddings and persistent memory included
  • Windows portable zip bundles CPython; no system Python needed
  • Apache 2.0 license

Caveats

  • The README contains conflicting version numbers: one section claims 0.3.1 is current, another says 0.2.1
  • Windows portable builds are unsigned and require administrator launch; SmartScreen will complain
  • SquillaRouter needs the Visual C++ runtime on Windows, which the quick terminal install path does not auto-install
  • Git LFS required for source installs to pull router model assets

Verdict

Worth a look if you’re running multi-model agent workloads and your API bill matters. Skip it if you want a single-model, single-provider setup — the routing complexity is overhead you don’t need.

heatdrop uses Google Analytics to see which pages get read — nothing else. Your call. How we handle data.