Maximilian-Winter/llama-cpp-agent
An agent framework enabling function calling, structured output, and RAG for LLMs via guided sampling.

The llama-cpp-agent framework provides an interface for building LLM-powered agents with support for single and parallel function calling, structured output generation, and retrieval augmented generation with ColBERT reranking. It uses guided sampling via grammars and JSON schema to enable function calling even on models not fine-tuned for it. The framework integrates with llama.cpp, llama-cpp-python, TGI, and vllm servers, and offers conversational, sequential, and mapping agent chains.