Computer Vision

newcomers · velocity + momentum

+196 ★/day→steady

A $9 ESP32 board turns radio reflections into room-scale presence detection, vital signs, and pose estimation — no lenses, no wearables, no cloud.

★ 71.7k Rust Domain Apps · explained

Robbyant/lingbot-map

+133 ★/day→steady

LingBot-Map reconstructs scenes from streaming video in one forward pass, handling 10,000+ frames without iterative optimization.

★ 7.1k Python Computer Vision · explained

deepseek-ai/DeepSeek-OCR

+99 ★/day→steady

An LLM-centric vision encoder that squeezes documents into surprisingly few tokens, then lets the language model do the actual reading.

★ 23.3k Python Inference · Serving · explained

zai-org/GLM-OCR

+55 ★/day→steady

GLM-OCR squeezes document understanding into a sub-1B model with a layout-aware pipeline and enough deployment options to please any ops team.

★ 6.9k Python Computer Vision · explained

datalab-to/chandra

+46 ★/day→steady

Chandra OCR 2 turns scanned chaos into structured Markdown, HTML, or JSON without destroying the layout.

★ 11.1k Python Computer Vision · explained

wiltodelta/remove-ai-watermarks

+41 ★/day→steady

A Python toolkit that reverse-engineers alpha-blended logos, strips C2PA manifests, and diffuses away invisible fingerprints like SynthID.

★ 3k Python Computer Vision · explained

facebookresearch/segment-anything

+46 ★/day→steady

SAM lets you isolate any object in an image with a click or bounding box, no custom training required.

★ 54.3k Jupyter Notebook Computer Vision · explained

ultralytics/ultralytics

+43 ★/day→steady

Ultralytics turned the classic object detector into a unified computer-vision Swiss Army knife you can train via CLI or Python.

★ 58.1k Python Computer Vision · explained

microsoft/OmniParser

+40 ★/day→steady

OmniParser extracts clickable elements from raw screenshots so vision models can actually *do* things on a desktop without peeking at the DOM.

★ 24.9k Jupyter Notebook Agents · explained

facebookresearch/dinov3

+35 ★/day→steady

DINOv3 is a family of self-supervised vision backbones designed to produce high-quality dense features for everything from semantic segmentation to satellite canopy mapping, often beating task-specialized models out of the box.

★ 10.6k Jupyter Notebook Computer Vision · explained

facebookresearch/sam3

+32 ★/day→steady

A foundation model that segments images and videos using open-vocabulary text prompts like "a player in white."

★ 10.4k Python Computer Vision · explained

PaddlePaddle/PaddleOCR

+37 ★/day→steady

PaddleOCR turns scans and PDFs into structured Markdown or JSON using a tiny vision-language model that punches above its weight class.

★ 81.3k Python Computer Vision · explained

upscayl/upscayl

+33 ★/day→steady

Upscayl wraps Real-ESRGAN and Vulkan in an Electron app so you can enlarge images without paying Topaz Gigapixel's rent.

★ 45.9k TypeScript Computer Vision · explained

roboflow/supervision

+32 ★/day→steady

A model-agnostic Python toolkit that handles the boring parts of computer vision: annotations, dataset juggling, and tracking.

★ 41.5k Python Computer Vision · explained

rednote-hilab/dots.ocr

+28 ★/day→steady

A single small vision-language model that parses documents, charts, and even street signs into structured text or SVG code.

★ 8.9k Python Computer Vision · explained

facebookresearch/vggt

+28 ★/day→steady

VGGT turns one image—or a hundred—into camera poses, depth maps, point clouds, and trackable 3D points without any optimization loop.

★ 13.3k Python Computer Vision · explained

facebookresearch/sam2

+28 ★/day→steady

SAM 2 extends the original Segment Anything to video with streaming memory, turning one-off image masks into persistent object tracking.

★ 19.3k Jupyter Notebook Computer Vision · explained

hiroi-sora/Umi-OCR

+29 ★/day→steady

A Qt-based desktop app for screenshot, batch, and PDF OCR without phoning home to any API.

★ 45k Python Computer Vision · explained

facefusion/facefusion

+28 ★/day→steady

A Python toolkit for face manipulation built around job queues, remixable steps, and headless automation rather than one-off GUI wizardry.

★ 28.7k Python Image · Video · Audio · explained

screenpipe/screenpipe

+27 ★/day→steady

screenpipe records everything you see, say, and hear—locally, searchable, and feedable to AI agents.

★ 19.2k Rust Agents · explained

loading more…