← all repositories
towhee-io/examples

Jupyter notebooks for turning unstructured data into searchable vectors

A collection of runnable examples for image, video, audio, text, and even molecular search using the Towhee embedding pipeline.

522 stars Jupyter Notebook RAG · SearchData Tooling
examples
Velocity · 7d
+0.3
★ / day
Trend
steady
star history

What it does

This repo holds Jupyter notebooks that demonstrate how to use Towhee to generate embedding vectors from messy, unstructured data—images, video, audio, text, and molecular structures. Each example is a self-contained bootcamp: reverse image search with ResNet or CLIP, deepfake detection, text-to-video retrieval, even credit-card approval prediction. The pitch is “x2vec, Towhee is all you need.”

The interesting bit

The breadth is the point. Most embedding tutorials stop at text or images; this stretches to molecular fingerprinting (via RDKit) and cross-modal search (text-to-image, text-to-video). It’s essentially a cookbook for the “everything is a vector” worldview.

Key highlights

  • Covers 14+ example pipelines across image, video, audio, NLP, medical, and data science domains
  • Includes cross-modal retrieval: text-to-image search with CLIP, text-to-video with CLIP4Clip
  • Uses Towhee’s operator hub—pre-built ML operators for embedding, classification, and translation
  • Molecular search supports Tanimoto similarity, substructure, and superstructure queries
  • Deepfake detection and anime/cartoon image animation sit side-by-side with enterprise-y credit scoring

Caveats

  • The “Fine Tune” tutorial has a typo in its description (“tuen”), suggesting light copy-editing
  • Some operator links point to external towhee.io pages; notebook freshness depends on upstream operator maintenance
  • Credit card approval example lacks listed operators, making it unclear how Towhee-specific it actually is

Verdict

Worth a browse if you’re evaluating Towhee or need a quick, runnable prototype for vector search beyond text. Skip it if you already have a mature embedding pipeline and just want API docs.

heatdrop uses Google Analytics to see which pages get read — nothing else. Your call. How we handle data.