← all repositories
NTMC-Community/MatchZoo

A zoo for text-matching models, caged and ready to benchmark

MatchZoo corrals a dozen neural text-matching architectures into one Keras-based toolkit so researchers can stop rewriting boilerplate and start comparing models.

MatchZoo
Velocity · 7d
+1.2
★ / day
Trend
steady
star history

What it does MatchZoo is a Python toolkit that bundles implementations of neural text-matching models—DSSM, DRMM, MatchPyramid, K-NRM, and others—behind a unified preprocessing pipeline and task abstraction. It handles ranking and classification tasks like document retrieval, QA, and paraphrase identification. The pitch is “get started in 60 seconds,” which mostly means the API wraps Keras model.fit_generator calls with pair-wise data generators and built-in metrics like NDCG and MAP.

The interesting bit The real value isn’t any single model; it’s the standardization. MatchZoo forces each architecture through the same preprocessor/task/metrics funnel, so swapping DRMM for Conv-KNRM becomes a one-line change. For a field that loves to benchmark, that glue code is the actual contribution.

Key highlights

  • Ships with 11 implemented models (DSSM, CDSSM, ARC-I/II, MV-LSTM, DRMM, MatchPyramid, aNMM, DUET, K-NRM, Conv-KNRM) plus several marked “under development”
  • Unified Preprocessor/Task/DataGenerator pipeline abstracts away data munging
  • Custom ranking losses and IR metrics (RankCrossEntropyLoss, NDCG@k, MAP) built in
  • Published at SIGIR 2019, suggesting academic credibility
  • PyTorch successor (MatchZoo-py) now available; this repo is the original Keras/TensorFlow version

Caveats

  • Built on Keras/TensorFlow 1.x-era patterns (model.fit_generator, etc.); the README’s own “News” banner points to the PyTorch fork as the future
  • Several listed models (Match-SRNN, DeepRank, BiMPM) are noted as “under development” with no clear status
  • Python 3.6/3.7 only; no mention of modern Python or TF 2.x compatibility

Verdict Grab this if you’re reproducing a 2016–2019 text-matching paper and want the reference implementation with minimal scaffolding. Skip it if you’re starting fresh—head to MatchZoo-py instead, or use Hugging Face SentenceTransformers for a more modern embedding-based approach.

heatdrop uses Google Analytics to see which pages get read — nothing else. Your call. How we handle data.