← all repositories
deepseek-ai/Engram

DeepSeek adds a lookup table to LLMs and calls it a sparsity axis

Engram revives N-gram embeddings as O(1) memory retrieval, arguing Transformers need a native knowledge-lookup primitive beyond MoE.

4.4k stars Python Language ModelsML Frameworks
Engram
Velocity · 7d
+30
★ / day
Trend
steady
star history

What it does

Engram is a module that bolts static N-gram memory retrieval onto a Transformer backbone. It performs deterministic lookups of token-sequence embeddings and fuses them with the model’s dynamic hidden states. The pitch: MoE scales via conditional computation, but Transformers have no native “go look this up” operation, so Engram fills that gap with O(1) addressing.

The interesting bit

The paper identifies a U-shaped scaling law for allocating capacity between neural computation (MoE) and static memory (Engram). The mechanistic claim is that offloading static pattern reconstruction to Engram preserves early-layer capacity for more complex reasoning downstream. Also notable: deterministic addressing lets you park the embedding tables in host memory without torching inference latency.

Key highlights

  • Complementary sparsity axis: Positioned as a third knob alongside width and MoE depth for scaling model capacity.
  • Iso-param/iso-FLOP wins: Engram-27B reportedly beats MoE baselines on knowledge, reasoning, code, and math under strict budget constraints.
  • Host memory offloading: The lookup mechanism is deterministic enough to move massive tables out of GPU memory.
  • Standalone demo: engram_demo_v1.py runs independently, though it mocks Attention/MoE/mHC to isolate the module.

Caveats

  • The repo is a “demonstration version” — core logic only, with standard components stubbed out.
  • No training code or pretrained checkpoints are visible in the README; you’ll need to chase the paper or contact DeepSeek for the full stack.

Verdict

Worth a look if you’re researching sparse architectures or memory-augmented LLMs. Skip it if you need production-ready training infrastructure today — this is a research artifact with training wheels still on.

heatdrop uses Google Analytics to see which pages get read — nothing else. Your call. How we handle data.