← all repositories
texttron/tevatron

One retrieval toolkit, two frameworks, zero patience for glue code

Tevatron trains billion-scale neural retrievers on both PyTorch/GPU and JAX/TPU without making you rewrite your pipeline.

tevatron
Velocity · 7d
+0.4
★ / day
Trend
steady
star history

What it does Tevatron is a training and inference toolkit for dense document retrieval. It fine-tunes LLMs (via LoRA) into embedding models, encodes queries and passages into vectors, then runs similarity search. The same workflow runs on PyTorch with DeepSpeed/FlashAttention/vLLM, or on JAX with TPU/GPU support.

The interesting bit The project ships self-contained HuggingFace datasets and directly loads SoTA pretrained embedders like BGE and E5. The JAX path is genuinely separate—not a wrapper—using custom tevax modules with chunked gradient caching (grad_cache, passage_num_chunks) to fit large models on TPU. The PyTorch path is more conventional but plugs into the familiar ecosystem.

Key highlights

  • LoRA fine-tuning of 7B+ retrievers with a specific target module list (q/k/v/o projections plus MLP gates)
  • Dual-stack: PyTorch (DeepSpeed ZeRO-3, bf16, gradient checkpointing) or JAX (mesh parallelism, TPU-optimized)
  • Self-contained HF datasets for multilingual and multi-modal retrieval; image fields optional in JSONL schema
  • Direct encoding→retrieval pipeline outputs standard query_id passage_id score ranking files
  • Example claims MRR@10=42.3 on MS MARCO with “straightforward training” of Mistral-7B

Caveats

  • v1 features not yet migrated to v2; README warns users to pull the v1 branch if they need legacy functionality
  • JAX GPU setup requires NVIDIA’s jax-toolbox container or careful manual dependency resolution (magix, GradCache from separate repos)
  • Training times are substantial: ~70 hours on 4×A6000 (PyTorch) or ~35 hours on v4-8 TPU (JAX) for the MS MARCO example

Verdict Worth a look if you’re doing research-scale dense retrieval and want to switch between GPU and TPU without rebuilding your pipeline. Skip if you need stable production APIs or are allergic to pip install -e . from multiple git repos.

heatdrop uses Google Analytics to see which pages get read — nothing else. Your call. How we handle data.