Teaching LLMs to memorize documents by editing their own weights
Doc-to-LoRA trains a hypernetwork to synthesize LoRA adapters from raw text, letting a base model temporarily internalize facts without RAG or retraining.

What it does
Doc-to-LoRA (D2L) trains a hypernetwork to read a document and emit a small LoRA adapter. Those weights are hot-swapped onto a frozen base LLM so the model suddenly “knows” the document’s facts. Call model.reset() and the knowledge evaporates, leaving the original weights intact. The repo is a full reference implementation with training pipelines, evaluation scripts, and pre-trained checkpoints.
The interesting bit
Rather than retrieving chunks or stuffing tokens into a context window, the project compresses information directly into parameter space. The README demonstrates this with model.internalize(doc), after which generation is steered by the internalized text. It treats memory as a temporary diff against the model rather than an input sequence.
Key highlights
- Synthesizes document-specific LoRA weights on the fly from raw text.
- Ships with pre-trained checkpoints, an interactive web demo, and a self-generated data viewer.
- Includes reproducible scripts for the main paper experiments and a Needle-In-A-Haystack benchmark.
- Exposes an
internalize/resetcycle that acts like a temporary, removable memory layer.
Caveats
- The demonstrated Python API only supports non-batched inputs; batched inference requires dropping into
src/ctx_to_lora/modeling/hypernet.py. - The README does not quantify the latency or compute cost of synthesizing LoRA weights for new documents.
Verdict Researchers and engineers frustrated by context-window limits should experiment here. If you need a battle-tested production retrieval pipeline with known latency bounds, this remains a research prototype.