Ctrl+F for your dashcam footage
Semantic search over video: type "red truck running a stop sign," get back a trimmed clip.

What it does
SentrySearch chunks your video files, embeds each chunk via Gemini Embedding 2, Qwen3-VL (local), or Alibaba DashScope, and stores vectors in ChromaDB. Query with text or an image; it returns ranked timestamped matches and auto-trims the best one into a standalone clip. There’s also a highlights mode that surfaces statistically anomalous chunks when you don’t know what to search for.
The interesting bit The author treats video search as a solved problem by not solving video understanding at all — just throw overlapping chunks at a multimodal embedding API and let cosine similarity do the work. The clever part is the packaging: preprocessing (downscale to 480p, 5 fps), deduplication thresholds, and a three-project pipeline (search → merge multi-cam → auto-redact) that turns raw Tesla Sentry footage into usable evidence.
Key highlights
- Three backends: Gemini API (default, best quality), Alibaba DashScope (
qwen-cloud), or fully local Qwen3-VL (auto-detects 2B vs 8B based on your GPU/RAM) - Image search: drop a reference photo to find visually similar scenes
highlightsanomaly detection with three scoring methods (kNN, centroid, LOF) and query-relative anomaly modes- Built-in deduplication (
--dedupe 0.9) prevents the same event from flooding results across overlapping chunks - Optional Tesla metadata overlay and companion tools for multi-cam stitching (SentryMerge) and face/license-plate redaction (SentryBlur)
Caveats
- Requires Python 3.11–3.12; PyTorch wheels don’t support 3.13+
- Local backend is impractical on Intel Macs or CPU-only machines (falls back to float32, “too slow and memory-hungry”)
- Anomaly detection isn’t magic: lens flares, night frames, and sensor glitches rank as “interesting” — use
--exclude-baselineand--againstto constrain it
Verdict Essential if you’re sitting on terabytes of dashcam or security footage and currently find events by scrubbing timelines. Skip if you don’t have a GPU or API budget and expect local inference to just work — the hardware requirements are real.