CLIP + 2 million Unsplash photos = search by vibes
A notebook pipeline that lets you find photos with phrases like "the feeling when your program finally works."

What it does
This repo is a Jupyter notebook pipeline that embeds nearly 2 million Unsplash photos using OpenAI’s CLIP model, then lets you search them with natural language queries. You type a sentence; CLIP encodes both your text and the images into the same vector space, and cosine similarity does the rest. There’s also a Colab notebook if you just want to play without downloading gigabytes of photos.
The interesting bit
The project demonstrates CLIP’s ability to handle genuinely fuzzy, emotional queries — “the feeling when your program finally works” returns relevant results despite having zero literal tags. That’s the latent-space magic: it isn’t matching keywords, it’s matching conceptual neighborhoods.
Key highlights
- Pre-computed CLIP embeddings for ~2M Unsplash photos (full dataset, not just the public Lite version)
- One-click Colab demo for query experimentation
- Alternative notebook that filters Unsplash’s own Search API through CLIP re-ranking
- Notebooks are numbered sequentially: setup → download → process → search
Caveats
- The full Unsplash Dataset requires a (free) application; the Lite version is public but smaller
- API-based search without the local dataset is supported but “will probably deliver worse results”
- This is essentially a well-documented glue pipeline around CLIP and Unsplash data, not a novel model or production service
Verdict
Worth an hour if you’re building semantic search, need a CLIP-on-images reference implementation, or want to demo vector search to a skeptical team. Skip it if you need a hosted API or are already running your own image embedding pipeline.