Drop a BERT between your users and Elasticsearch
A proxy that reranks search results with transformer models before your users ever see them.
What it does
NBoost sits between clients and Elasticsearch (or similar search APIs), intercepts queries, fetches a larger result set than requested, and reranks it through a fine-tuned transformer model before returning the top-k to the user. It ships as a Docker image, PyPI package, or Helm chart for Kubernetes.
The interesting bit
The proxy doesn’t replace your search engine—it just makes it look smarter. By inflating the initial request (e.g., asking Elasticsearch for 100 results when the user wants 10), it gives the model enough candidates to meaningfully rerank. The README claims MRR improvements of +45% to +77% over BM25 on MS MARCO, with the tradeoff being latency (~50ms for TinyBERT, ~300ms for BERT-base on GPU).
Key highlights
- Pre-trained models for general and biomedical domains (TinyBERT, BERT-base, BioBERT), auto-downloaded on first use
- Supports both PyTorch and TensorFlow backends, plus ONNX Runtime
- Kubernetes-ready with Helm charts; also runs as a one-liner Docker container or pip install
- Includes
nboost-indexhelper for bulk-indexing CSV data into Elasticsearch - Extensible to other search APIs beyond Elasticsearch (though docs focus heavily on ES)
Caveats
- Benchmarks compare models trained on MS MARCO but evaluated on TREC-CAR, which the authors frame as generalizability—your mileage on actual production queries may vary
- No latency numbers for CPU-only inference; GPU is essentially assumed for non-TinyBERT models
- Project appears quiet; the README’s top banner is a recruitment ad for an unrelated “virtual assistant” beta
Verdict
Worth a look if you’re already running Elasticsearch and want better relevance without reindexing or swapping search backends. Skip it if you need real-time latencies under 50ms, or if your search stack isn’t ES-shaped.