← all repositories

byjlw/video-analyzer

A video analysis tool combining vision language models and Whisper ASR to generate natural language descriptions from extracted video frames and audio transcripts.

1.4k stars Python Computer VisionLanguage Models
video-analyzer
Velocity · 7d
+2.5
★ / day
Trend
steady
star history

The tool extracts key frames from videos using OpenCV and processes audio through Whisper for transcription. Each frame is analyzed using a vision-enabled LLM to extract visual details, which are then combined with transcript data to produce comprehensive descriptions of video content. It supports both local execution via Ollama and cloud-based OpenAI-compatible APIs.

heatdrop uses Google Analytics to see which pages get read — nothing else. Your call. How we handle data.