A model-agnostic Python toolkit that handles the boring parts of computer vision: annotations, dataset juggling, and tracking.
Computer Vision
heavyweights · gaining speedPaddleOCR turns scans and PDFs into structured Markdown or JSON using a tiny vision-language model that punches above its weight class.
OpenCV is the de facto standard for computer vision, and its README is almost aggressively humble about it.
Eagle is less a single model than NVIDIA's internal R&D pipeline for multimodal AI, now open-sourced with three generations of VLMs and a grounding specialist.
A $9 ESP32 board turns radio reflections into room-scale presence detection, vital signs, and pose estimation — no lenses, no wearables, no cloud.
A foundation model that turns one image into a full 3D body mesh, optionally guided by keypoints or masks like the original SAM.
A speed-focused deep learning system for analyzing massive scientific images, from crowds to cancer slides to galaxies.
BiRefNet splits images into layers using bilateral references, then offers a whole zoo of task-specific weights for everything from background removal to camouflaged-object detection.
Someone finally collected all those "top projects" Medium posts into one giant table.
A single Python framework that pretrains DINOv2/v3 on unlabeled data, then fine-tunes and distills for detection and segmentation tasks.
Extends 3D Gaussian Splatting to time-varying scenes without sacrificing the real-time rendering speed that made the original technique appealing.
Frigate is a local NVR that runs AI object detection on IP cameras without phoning home to the cloud.
MAA automates the daily grind of Arknights using computer vision, with more bindings than a language conference.
An ONNX-exported, multi-engine OCR toolkit that runs offline on basically anything.
A battle-tested pipeline that reconstructs 3D scenes from unordered image collections using structure-from-motion and multi-view stereo.
A browser-native image toolkit that removes backgrounds and upscales photos using local AI models—no server, no subscription, no data leaving your laptop.
LichtFeld Studio wraps the entire 3D Gaussian Splatting pipeline—training, editing, exporting, automating—into a single C++ desktop app instead of a chain of Python scripts.
A unified training and evaluation framework for remote photoplethysmography—turning ordinary camera video into physiological signals without contact.
Hundreds of pre-quantized computer vision models, converted to every framework you didn't want to learn yourself.
A turn-key OCR engine built for historical manuscripts, non-Latin scripts, and the messy reality of digitization.




