A model-agnostic Python toolkit that handles the boring parts of computer vision: annotations, dataset juggling, and tracking.
Computer Vision
heavyweights · velocity + momentumA $9 ESP32 board turns radio reflections into room-scale presence detection, vital signs, and pose estimation — no lenses, no wearables, no cloud.
PaddleOCR turns scans and PDFs into structured Markdown or JSON using a tiny vision-language model that punches above its weight class.
OpenCV is the de facto standard for computer vision, and its README is almost aggressively humble about it.
Eagle is less a single model than NVIDIA's internal R&D pipeline for multimodal AI, now open-sourced with three generations of VLMs and a grounding specialist.
A Python toolkit that reverse-engineers alpha-blended logos, strips C2PA manifests, and diffuses away invisible fingerprints like SynthID.
Ultralytics turned the classic object detector into a unified computer-vision Swiss Army knife you can train via CLI or Python.
A Qt-based desktop app for screenshot, batch, and PDF OCR without phoning home to any API.
A pure C++ pipeline that turns monocular video into multi-person BVH files ready for Blender.
A foundation model that turns one image into a full 3D body mesh, optionally guided by keypoints or masks like the original SAM.
Someone finally collected all those "top projects" Medium posts into one giant table.
Upscayl wraps Real-ESRGAN and Vulkan in an Electron app so you can enlarge images without paying Topaz Gigapixel's rent.
Surya does OCR, layout analysis, reading order, and table recognition in 90+ languages from a single VLM.
LingBot-Map reconstructs scenes from streaming video in one forward pass, handling 10,000+ frames without iterative optimization.
Frigate is a local NVR that runs AI object detection on IP cameras without phoning home to the cloud.
MAA automates the daily grind of Arknights using computer vision, with more bindings than a language conference.
screenpipe records everything you see, say, and hear—locally, searchable, and feedable to AI agents.
HP's abandoned text-recognition project became the open-source default for turning images into words.
A foundation model that segments images and videos using open-vocabulary text prompts like "a player in white."
A local AI tool that inpaints burned-in text from videos and images, keeping original resolution intact.


