Microsoft's old search model, rebuilt in Keras
A clean reference implementation of DSSM/CLSM for learning text similarity, minus the proprietary Bing data you'd need to actually run it.

What it does
This repo implements the Deep Semantic Similarity Model (DSSM) and its convolutional variant (CLSM), a Microsoft Research architecture from 2014 for ranking query-document pairs by learned semantic similarity. It maps text through deep neural layers into a shared latent space, then scores matches with cosine similarity. The catch: you bring your own search logs, since the author notes all decent search datasets are locked behind corporate walls.
The interesting bit
The value here is archeological. DSSM was an early bridge between deep learning and information retrieval, predating the transformer era when people still believed convolutions on letter n-grams might capture meaning. The code preserves that architectural thinking in modern Keras, which is useful if you’re tracing how we got from TF-IDF to BERT.
Key highlights
- Direct Keras port of the CIKM 2014 Microsoft paper with CLSM extension
- Includes links to original Microsoft Research slides and reference collection
- ~500 stars suggests it served as a common starting point for IR researchers
- No dependencies listed, but Keras-era (likely TF 1.x or early 2.x)
- Author explicitly warns: no data included, BYO search corpus
Caveats
- Last meaningful activity unclear; Keras APIs this targets are likely deprecated
- No training example, no sample output, no performance numbers — just the architecture
- Requires significant data engineering before it does anything useful
Verdict
Worth a skim if you’re writing a literature review on neural IR or need to reproduce a 2014 baseline. Skip it if you want something that trains on Colab in ten minutes; this is a blueprint, not a product.