valentinfrlch/ha-llmvision
A Home Assistant integration leveraging multimodal LLMs to analyze surveillance camera feeds and events with AI-powered visual understanding.

This project integrates multimodal large language models into Home Assistant for analyzing images, videos, live camera feeds, and Frigate events. It supports numerous LLM providers including OpenAI, Anthropic, Gemini, Ollama, and any OpenAI-compatible endpoint. The system can answer questions about visual content, remember people and objects across sessions, and maintain a searchable timeline of analyzed camera events with customizable prompts and notifications.