← all repositories
koursaros-ai/nboost

Drop a BERT between your users and Elasticsearch

A proxy that reranks search results with transformer models before your users ever see them.

nboost
Velocity · 7d
+0.3
★ / day
Trend
steady
star history

What it does

NBoost sits between clients and Elasticsearch (or similar search APIs), intercepts queries, fetches a larger result set than requested, and reranks it through a fine-tuned transformer model before returning the top-k to the user. It ships as a Docker image, PyPI package, or Helm chart for Kubernetes.

The interesting bit

The proxy doesn’t replace your search engine—it just makes it look smarter. By inflating the initial request (e.g., asking Elasticsearch for 100 results when the user wants 10), it gives the model enough candidates to meaningfully rerank. The README claims MRR improvements of +45% to +77% over BM25 on MS MARCO, with the tradeoff being latency (~50ms for TinyBERT, ~300ms for BERT-base on GPU).

Key highlights

  • Pre-trained models for general and biomedical domains (TinyBERT, BERT-base, BioBERT), auto-downloaded on first use
  • Supports both PyTorch and TensorFlow backends, plus ONNX Runtime
  • Kubernetes-ready with Helm charts; also runs as a one-liner Docker container or pip install
  • Includes nboost-index helper for bulk-indexing CSV data into Elasticsearch
  • Extensible to other search APIs beyond Elasticsearch (though docs focus heavily on ES)

Caveats

  • Benchmarks compare models trained on MS MARCO but evaluated on TREC-CAR, which the authors frame as generalizability—your mileage on actual production queries may vary
  • No latency numbers for CPU-only inference; GPU is essentially assumed for non-TinyBERT models
  • Project appears quiet; the README’s top banner is a recruitment ad for an unrelated “virtual assistant” beta

Verdict

Worth a look if you’re already running Elasticsearch and want better relevance without reindexing or swapping search backends. Skip it if you need real-time latencies under 50ms, or if your search stack isn’t ES-shaped.

heatdrop uses Google Analytics to see which pages get read — nothing else. Your call. How we handle data.