A model-agnostic Python toolkit that handles the boring parts of computer vision: annotations, dataset juggling, and tracking.
Computer Vision
newcomers · velocity + momentumA $9 ESP32 board turns radio reflections into room-scale presence detection, vital signs, and pose estimation — no lenses, no wearables, no cloud.
PaddleOCR turns scans and PDFs into structured Markdown or JSON using a tiny vision-language model that punches above its weight class.
OpenCV is the de facto standard for computer vision, and its README is almost aggressively humble about it.
Eagle is less a single model than NVIDIA's internal R&D pipeline for multimodal AI, now open-sourced with three generations of VLMs and a grounding specialist.
A Python toolkit that reverse-engineers alpha-blended logos, strips C2PA manifests, and diffuses away invisible fingerprints like SynthID.
Ultralytics turned the classic object detector into a unified computer-vision Swiss Army knife you can train via CLI or Python.
A pure C++ pipeline that turns monocular video into multi-person BVH files ready for Blender.
A foundation model that turns one image into a full 3D body mesh, optionally guided by keypoints or masks like the original SAM.
A Qt-based desktop app for screenshot, batch, and PDF OCR without phoning home to any API.
Someone finally collected all those "top projects" Medium posts into one giant table.
Surya does OCR, layout analysis, reading order, and table recognition in 90+ languages from a single VLM.
Upscayl wraps Real-ESRGAN and Vulkan in an Electron app so you can enlarge images without paying Topaz Gigapixel's rent.
LingBot-Map reconstructs scenes from streaming video in one forward pass, handling 10,000+ frames without iterative optimization.
Frigate is a local NVR that runs AI object detection on IP cameras without phoning home to the cloud.
MAA automates the daily grind of Arknights using computer vision, with more bindings than a language conference.
screenpipe records everything you see, say, and hear—locally, searchable, and feedable to AI agents.
A foundation model that segments images and videos using open-vocabulary text prompts like "a player in white."
HP's abandoned text-recognition project became the open-source default for turning images into words.
A speed-focused deep learning system for analyzing massive scientific images, from crowds to cancer slides to galaxies.



