RAG deployment for developers who'd rather be shipping
AutoLLM wraps LlamaIndex and LiteLLM into one-liners so you can stop wiring boilerplate and start serving queries.

What it does AutoLLM is a Python convenience layer that spins up a retrieval-augmented generation (RAG) pipeline and exposes it as a FastAPI endpoint with minimal code. Feed it documents, pick an LLM, and it handles embedding, vector storage, and query serving. It leans heavily on LlamaIndex for the RAG logic and LiteLLM for routing to 100+ model providers.
The interesting bit
The project treats “1-line” as a genuine design constraint. AutoQueryEngine.from_defaults(documents) builds the full pipeline, and AutoFastAPI.from_query_engine(query_engine) generates the API. The cost calculator is a nice touch: it token-counts against LiteLLM’s price sheet and prints spend per query, which is the kind of operational detail most wrappers skip.
Key highlights
- Supports 100+ LLMs via LiteLLM (OpenAI, Azure, Vertex, Bedrock, Ollama, HuggingFace, etc.)
- Defaults to LanceDB for vector storage (setup-free, serverless)
- Optional
enable_cost_calculator=Truefor per-query spend tracking AutoFastAPIgenerates a working API from an existing query engine- Migration path from existing LlamaIndex
VectorStoreIndexinstances
Caveats
- AGPL 3.0 license, which carries copyleft obligations for networked use
- Several roadmap items (Gradio apps, budget alerts, automated eval) are still unchecked
- The README’s feature comparison table claims “Unified API” and “1-Line” advantages over LangChain/LlamaIndex, but these are essentially convenience wrappers around those same tools
Verdict Good fit if you’re already in the LlamaIndex ecosystem and want to cut deployment ceremony. Skip it if you need deep customization or can’t stomach AGPL’s network-activation clause.