← all repositories
slavabarkov/tidy

Your camera roll, searchable by vibes

An Android app that runs CLIP locally to let you type "golden retriever on a beach" and actually find that photo.

573 stars Kotlin RAG · SearchComputer Vision
tidy
Velocity · 7d
+0.5
★ / day
Trend
steady
star history

What it does

TIDY indexes every photo on your Android device using a quantized CLIP model, then lets you search your own library with natural language or by showing it another image. The indexing happens once at first launch; new photos get picked up automatically afterward. Everything stays on-device — no cloud, no accounts, no connectivity required.

The interesting bit

The trick is making a ~2B-parameter vision-language model run comfortably on a phone. The author uses ONNX Runtime with quantization to squeeze CLIP (specifically an OpenCLIP variant trained on LAION-2B) into something that can batch-process your camera roll without melting the battery. The result is genuine semantic search — “dog looking guilty” works, not just filename matching.

Key highlights

  • Pure offline operation: no network calls, no data exfiltration
  • Text-to-image and image-to-image retrieval in one app
  • Automatic incremental indexing after initial scan
  • Distributed via GitHub releases and F-Droid
  • Built in Kotlin around ONNX Runtime inference

Caveats

  • Initial indexing time is unspecified but implied to be nontrivial for large libraries
  • Only Android; no iOS or desktop builds mentioned
  • Model specifics (size, speed, memory footprint) aren’t quantified in the README

Verdict

Grab it if you’ve got 10,000 unlabeled photos and a patience for first-run indexing. Skip if you’re hoping for cloud-backed cross-device sync or fine-grained EXIF filtering — this is search by semantic embedding, not by metadata.

heatdrop uses Google Analytics to see which pages get read — nothing else. Your call. How we handle data.