← all repositories
TheShadow29/awesome-grounding

A field guide to making AI point at things correctly

A manually curated, chronologically sorted bibliography of visual grounding research, because finding the right paper shouldn't require its own grounding model.

awesome-grounding
Velocity · 7d
+0.4
★ / day
Trend
steady
star history

What it does This is an awesome-list that catalogs research papers on visual grounding—teaching machines to locate objects, moments, or regions in images and video using natural language descriptions. The maintainer claims to have personally reviewed every listed paper for relevance, and each entry includes paper links plus code repositories when available.

The interesting bit The list is organized as a chronological “paper roadmap” rather than dumping everything alphabetically, which actually helps trace how the field evolved from early image-sentence alignment work through referring expressions, video moment localization, and now 3D embodied agents. It also explicitly marks sections as work-in-progress—grounded description for images and videos are still incomplete.

Key highlights

  • Covers image grounding (RefCOCO, Visual Genome), video grounding (Charades-STA, DiDeMo), and 3D/embodied platforms (Habitat, AI2-THOR)
  • Includes 12 image datasets and 8 video datasets with direct paper, code, and website links
  • Chronological organization from 2014 through recent work
  • Maintainer reviews submissions personally and aims for one-week PR turnaround
  • Links to related compilations for temporal grounding and multimodal ML

Caveats

  • Two major sections (Grounded Description for images and video) are explicitly marked WIP
  • The embodied agents section only lists three platforms with minimal detail
  • No search, filtering, or tagging beyond the manual table of contents

Verdict Worth bookmarking if you’re entering visual grounding research or need to trace the lineage of a specific subproblem. Skip it if you want interactive paper discovery or automated alerts—this is a static, human-curated index.

heatdrop uses Google Analytics to see which pages get read — nothing else. Your call. How we handle data.