NVIDIA-AI-Blueprints/rag
A reference implementation of a Retrieval-Augmented Generation pipeline using NVIDIA NIM microservices and NeMo Retriever models.

This blueprint provides a foundational RAG pipeline that combines LLMs with real-time retrieval from enterprise data sources to ground AI responses. It integrates GPU-accelerated components including NeMo Retriever embedding models, vision language models, and guardrailing services. The solution supports agentic RAG via LangGraph for multi-hop queries with plan-and-execute orchestration, parallel task execution, and streaming stage events.