byjlw/video-analyzer
A video analysis tool combining vision language models and Whisper ASR to generate natural language descriptions from extracted video frames and audio transcripts.

Velocity · 7d
+2.5
★ / day
Trend
→steady
star history
The tool extracts key frames from videos using OpenCV and processes audio through Whisper for transcription. Each frame is analyzed using a vision-enabled LLM to extract visual details, which are then combined with transcript data to produce comprehensive descriptions of video content. It supports both local execution via Ollama and cloud-based OpenAI-compatible APIs.