Your camera roll, searchable by vibes
An Android app that runs CLIP locally to let you type "golden retriever on a beach" and actually find that photo.

What it does
TIDY indexes every photo on your Android device using a quantized CLIP model, then lets you search your own library with natural language or by showing it another image. The indexing happens once at first launch; new photos get picked up automatically afterward. Everything stays on-device — no cloud, no accounts, no connectivity required.
The interesting bit
The trick is making a ~2B-parameter vision-language model run comfortably on a phone. The author uses ONNX Runtime with quantization to squeeze CLIP (specifically an OpenCLIP variant trained on LAION-2B) into something that can batch-process your camera roll without melting the battery. The result is genuine semantic search — “dog looking guilty” works, not just filename matching.
Key highlights
- Pure offline operation: no network calls, no data exfiltration
- Text-to-image and image-to-image retrieval in one app
- Automatic incremental indexing after initial scan
- Distributed via GitHub releases and F-Droid
- Built in Kotlin around ONNX Runtime inference
Caveats
- Initial indexing time is unspecified but implied to be nontrivial for large libraries
- Only Android; no iOS or desktop builds mentioned
- Model specifics (size, speed, memory footprint) aren’t quantified in the README
Verdict
Grab it if you’ve got 10,000 unlabeled photos and a patience for first-run indexing. Skip if you’re hoping for cloud-backed cross-device sync or fine-grained EXIF filtering — this is search by semantic embedding, not by metadata.