← all repositories
airalcorn2/Deep-Semantic-Similarity-Model

Microsoft's old search model, rebuilt in Keras

A clean reference implementation of DSSM/CLSM for learning text similarity, minus the proprietary Bing data you'd need to actually run it.

521 stars Python Language ModelsML Frameworks
Deep-Semantic-Similarity-Model
Velocity · 7d
+0.1
★ / day
Trend
steady
star history

What it does

This repo implements the Deep Semantic Similarity Model (DSSM) and its convolutional variant (CLSM), a Microsoft Research architecture from 2014 for ranking query-document pairs by learned semantic similarity. It maps text through deep neural layers into a shared latent space, then scores matches with cosine similarity. The catch: you bring your own search logs, since the author notes all decent search datasets are locked behind corporate walls.

The interesting bit

The value here is archeological. DSSM was an early bridge between deep learning and information retrieval, predating the transformer era when people still believed convolutions on letter n-grams might capture meaning. The code preserves that architectural thinking in modern Keras, which is useful if you’re tracing how we got from TF-IDF to BERT.

Key highlights

  • Direct Keras port of the CIKM 2014 Microsoft paper with CLSM extension
  • Includes links to original Microsoft Research slides and reference collection
  • ~500 stars suggests it served as a common starting point for IR researchers
  • No dependencies listed, but Keras-era (likely TF 1.x or early 2.x)
  • Author explicitly warns: no data included, BYO search corpus

Caveats

  • Last meaningful activity unclear; Keras APIs this targets are likely deprecated
  • No training example, no sample output, no performance numbers — just the architecture
  • Requires significant data engineering before it does anything useful

Verdict

Worth a skim if you’re writing a literature review on neural IR or need to reproduce a 2014 baseline. Skip it if you want something that trains on Colab in ten minutes; this is a blueprint, not a product.

heatdrop uses Google Analytics to see which pages get read — nothing else. Your call. How we handle data.