← all repositories
o19s/elasticsearch-learning-to-rank

Teaching Elasticsearch to stop guessing what "relevant" means

An Elasticsearch plugin that lets you train ranking models on your own search data instead of hard-coding relevance heuristics.

1.5k stars Java Domain Apps
elasticsearch-learning-to-rank
Velocity · 7d
+0.4
★ / day
Trend
steady
star history

What it does

This plugin turns Elasticsearch into a platform for Learning to Rank: you define features as query templates, log their scores to build training data, then store and serve linear, XGBoost, or RankLib models directly inside Elasticsearch. Wikimedia and Snagajob already use it in production.

The interesting bit

The hard part of search ML isn’t the algorithm—it’s the plumbing. This plugin handles the tedious integration: feature storage, score logging, model serialization, and native ranking at query time, all within Elasticsearch’s plugin framework. The maintainers even gave a talk titled “We built it. Then came the hard part.”

Key highlights

  • Supports linear models, XGBoost, and RankLib models stored natively in Elasticsearch
  • Feature definitions are just Elasticsearch query templates, so you reuse existing search expertise
  • Prebuilt releases track Elasticsearch *.*.1 versions; “dot-oh” releases may need community PRs
  • Active since 2017 with significant contributions from Wikimedia, Yelp, and Bonsai
  • Demo and tutorials live in separate hello-ltr repo with Jupyter notebooks

Caveats

  • Plugin version must exactly match your Elasticsearch version; mismatches require manual builds
  • Known issues file exists but isn’t detailed in the README—you’ll need to dig into KNOWN_ISSUES.md
  • Training and model development happen offline; the plugin doesn’t magically generate your training data

Verdict

Worth a look if you run Elasticsearch at scale and have outgrown hand-tuned relevance scoring. Skip it if you need a turnkey solution—this is infrastructure, not a finished product, and you’ll still need data scientists to build the actual models.

heatdrop uses Google Analytics to see which pages get read — nothing else. Your call. How we handle data.