← all repositories
facebookresearch/StarSpace

One embedding space to rule them all

Facebook's C++ toolkit learns shared vector representations for words, users, pages, images, and graphs—then ranks anything against anything else.

StarSpace
Velocity · 7d
+1.2
★ / day
Trend
steady
star history

What it does StarSpace trains neural embeddings so disparate entities—words, documents, users, pages, even ResNet image features—live in the same vector space. Once they’re co-located, you can rank, classify, or retrieve across types: find documents for a query, recommend pages to a user, or measure sentence similarity. It reads fastText-style labeled data or more complex tab-separated formats, and spits out a binary model plus a TSV of vectors.

The interesting bit The name is the architecture: “star” (*) plus “space” means wildcard types sharing one vectorial space. The training modes are the clever glue—six different strategies for constructing left-hand side / right-hand side pairs from the same raw data, depending on whether you’re doing classification, collaborative filtering, graph relations, or unsupervised word2vec-style skip-gram.

Key highlights

  • Handles classification, retrieval, recommendation, graph embedding, and image tasks from one binary
  • Six trainMode settings reframe the same data as different LHS/RHS prediction problems
  • Supports weighted features, gzip-compressed inputs, and mini-batch training for speed
  • Python wrapper available; MIT licensed (was previously more restrictive)
  • File format compatible with fastText for easy migration

Caveats

  • Requires Boost and a C++11 compiler; Windows users need Visual Studio
  • README mentions real-valued weights and ImageSpace but gives sparse detail on the latter
  • Last significant update appears to be 2019-era; not clear how actively maintained

Verdict Worth a look if you need a single, fast C++ embedding engine for heterogeneous data without deep-learning ceremony. Skip it if you’re already invested in modern transformer pipelines or need extensive active community support.

heatdrop uses Google Analytics to see which pages get read — nothing else. Your call. How we handle data.