← all repositories
haltakov/natural-language-image-search

CLIP + 2 million Unsplash photos = search by vibes

A notebook pipeline that lets you find photos with phrases like "the feeling when your program finally works."

1k stars Jupyter Notebook RAG · SearchComputer Vision
natural-language-image-search
Velocity · 7d
+0.5
★ / day
Trend
steady
star history

What it does

This repo is a Jupyter notebook pipeline that embeds nearly 2 million Unsplash photos using OpenAI’s CLIP model, then lets you search them with natural language queries. You type a sentence; CLIP encodes both your text and the images into the same vector space, and cosine similarity does the rest. There’s also a Colab notebook if you just want to play without downloading gigabytes of photos.

The interesting bit

The project demonstrates CLIP’s ability to handle genuinely fuzzy, emotional queries — “the feeling when your program finally works” returns relevant results despite having zero literal tags. That’s the latent-space magic: it isn’t matching keywords, it’s matching conceptual neighborhoods.

Key highlights

  • Pre-computed CLIP embeddings for ~2M Unsplash photos (full dataset, not just the public Lite version)
  • One-click Colab demo for query experimentation
  • Alternative notebook that filters Unsplash’s own Search API through CLIP re-ranking
  • Notebooks are numbered sequentially: setup → download → process → search

Caveats

  • The full Unsplash Dataset requires a (free) application; the Lite version is public but smaller
  • API-based search without the local dataset is supported but “will probably deliver worse results”
  • This is essentially a well-documented glue pipeline around CLIP and Unsplash data, not a novel model or production service

Verdict

Worth an hour if you’re building semantic search, need a CLIP-on-images reference implementation, or want to demo vector search to a skeptical team. Skip it if you need a hosted API or are already running your own image embedding pipeline.

heatdrop uses Google Analytics to see which pages get read — nothing else. Your call. How we handle data.