lemony-ai/cascadeflow
A cascading runtime that routes AI agent requests across multiple LLM providers to optimize cost, latency, and quality.

CascadeFlow is an intelligent runtime layer for AI agents that automatically routes LLM requests across providers such as OpenAI, Anthropic, Google, Ollama, and others based on cost, latency, and quality requirements. It implements model cascading strategies where simpler tasks are handled by cheaper models while complex requests escalate to more capable models. The system includes built-in policy controls and budget management for optimizing LLM spending in production agent workflows.