Computer Vision

underdogs breaking out

+22% /wk +583 ★/day↗accelerating

It wants to parse entire documents in one shot without the model getting stuck in repetitive loops.

★ 18.7k Python Computer Vision · explained Feature

+15% /wk +330 ★/day↘cooling

It reconstructs 3D scenes from streaming video in real time without per-scene optimization, using a feed-forward transformer that remembers trajectory and corrects drift as it goes.

★ 15.3k Python Computer Vision · explained

wiltodelta/remove-ai-watermarks

+9.4% /wk +58 ★/day↗accelerating

It removes the visible Gemini sparkle, invisible SynthID fingerprints, and C2PA metadata that AI image generators embed in every output.

★ 4.3k Python Computer Vision · explained

SakuraMathcraft/LaTeXSnipper

+8.1% /wk +8.9 ★/day↘cooling

LaTeXSnipper bundles screenshot OCR, handwriting recognition, symbolic computation, and Office plugins into a single offline desktop app so you can actually use the math you capture.

★ 764 Python Computer Vision · explained

aiptimizer/TurboOCR

+5.4% /wk +4.3 ★/day→steady

TurboOCR exists because waiting for a vision-language model to read a receipt is a waste of GPU time.

★ 559 C++ Computer Vision · explained

NVlabs/alpamayo

+2.8% /wk +7.8 ★/day→steady

An open 10B-parameter VLA model that predicts driving trajectories while spelling out the causal reasoning behind every turn and lane change.

★ 1.9k Python Domain Apps · explained

Tencent-Hunyuan/HunyuanOCR

+2.8% /wk +7.5 ★/day→steady

HunyuanOCR-1.5 speeds up lightweight OCR vision-language models by drafting tokens with a block-diffusion model and closing capability gaps with an agent-driven data pipeline.

★ 1.9k Python Computer Vision · explained

NVlabs/alpasim

+2.4% /wk +3.8 ★/day→steady

AlpaSim exists so researchers can validate end-to-end self-driving policies in a closed-loop Python sandbox where renderers, physics, and traffic are swappable microservices.

★ 1.1k Python Agents · explained

roboflow/rf-detr

+1.6% /wk +19 ★/day↗accelerating

RF-DETR is Roboflow’s bet that a DINOv2 transformer backbone can finally beat YOLO on both speed and accuracy in real-world detection and segmentation tasks.

★ 8.7k Python Computer Vision · explained

ankandrew/fast-alpr

+1.4% /wk +1.4 ★/day→steady

Built to let you swap out the plate detector and OCR engine without trashing the rest of the pipeline.

★ 731 Python Computer Vision · explained

datalab-to/chandra

+1.4% /wk +23 ★/day→steady

It converts images and PDFs into structured HTML, Markdown, or JSON while reconstructing tables, forms, and handwriting that most OCR tools reduce to plain text soup.

★ 11.8k Python Computer Vision · explained

Faceplugin-ltd/Open-Source-Face-Recognition-SDK

+1.3% /wk +2.4 ★/day→steady

It gives Python developers on Windows and Linux a fully offline shortcut for detecting faces, extracting landmarks, and scoring similarity.

★ 1.4k Python Computer Vision · explained

NVlabs/Eagle

+1.1% /wk +5.3 ★/day↘cooling

Eagle is less a single model than NVIDIA's internal R&D pipeline for multimodal AI, now open-sourced with three generations of VLMs and a grounding specialist.

★ 3.3k Python Language Models · explained

screenpipe/screenpipe

+1.1% /wk +33 ★/day↘cooling

screenpipe continuously records your screen and audio locally so AI can search, summarize, and act on everything you’ve done without sending data to the cloud.

★ 20.6k Rust Agents · explained

YaoFANGUK/video-subtitle-remover

+0.9% /wk +16 ★/day↘cooling

It locally inpaints over hard-coded subtitles and text watermarks in videos and images so you never have to upload frames to a cloud API.

★ 12.1k Python Computer Vision · explained

CVHub520/X-AnyLabeling

+0.9% /wk +13 ★/day↗accelerating

This tool automates image and video annotation by plugging dozens of SOTA models into a single PyQt6 GUI.

★ 9.9k Python Data Tooling · explained

facebookresearch/vggt

+0.9% /wk +18 ★/day↗accelerating

VGGT replaces the traditional multi-stage 3D reconstruction pipeline with a single feed-forward model that predicts cameras, depth, and geometry from one or many images in seconds.

★ 14k Python Computer Vision · explained

RapidAI/RapidOCR

+0.9% /wk +8.9 ★/day↘cooling

An ONNX-exported, multi-engine OCR toolkit that runs offline on basically anything.

★ 7.3k Python Computer Vision · explained

mayocream/koharu

+0.8% /wk +5.9 ★/day→steady

Koharu automates the full manga-translation pipeline—detection, OCR, inpainting, and text rendering—entirely on your local machine.

★ 5k Rust Domain Apps · explained

Geekgineer/YOLOs-CPP

+0.8% /wk +1.1 ★/day↗accelerating

A single C++17 API wraps detection, segmentation, pose, OBB, and classification across YOLOv5 through YOLO26, no Python runtime required.

★ 1.1k C++ Inference · Serving · explained

loading more…