← all repositories

showlab/VLog

Video-language understanding system that converts videos into queryable text documents for LLM-based conversation via a novel generative retrieval narrator.

588 stars Python AgentsLanguage Models
VLog
Velocity · 7d
+0.5
★ / day
Trend
steady
star history

VLog introduces a GPT2-based video narrator with Narration Vocabulary for efficient video-language understanding. The VLog-Agent branch extends this by converting videos into textual documents containing visual and audio information, then leveraging LLMs through LangChain to enable natural language chatting over video content. It integrates Whisper for audio processing and uses generative retrieval for efficient narration generation.

heatdrop uses Google Analytics to see which pages get read — nothing else. Your call. How we handle data.