Facebook's memory network, rebuilt in TensorFlow
A clean re-implementation of a 2015 paper that lets language models read their own notes across multiple hops of attention.

What it does
Implements End-To-End Memory Networks (MemN2N) for language modeling in TensorFlow, translating Facebook’s original Lua/Torch code into Python. The model reads text, stores it in an external memory matrix, then performs multiple “hops” of attention over that memory to predict the next word. It ships with Penn Treebank data and trains via straightforward CLI flags.
The interesting bit
The “hops” mechanism is the core trick: instead of one pass over context, the model loops its own output back as a new query, refining what it retrieves each time. Think of it as re-reading your notes with a slightly different question until the answer clicks. The README includes a direct comparison to the original paper’s perplexity scores, which is more honesty than most re-implementations bother with.
Key highlights
- Reproduces the paper’s Section 5 language modeling task, not the full bAbI QA suite
- CLI exposes all hyperparameters: embedding dim, hops, memory size, gradient clipping
- Includes PTB data; accepts custom text with the same one-word-per-line format
- Optional progress bar via
progresspackage (because watching perplexity drop is most of the fun) - Single-author project from prolific re-implementer Taehoon Kim
Caveats
- Performance gap: 129 vs. 122 perplexity on the 3-hop/100-memory benchmark; 6-hop/150 result was “in progress” at last README update and may still be
- Requires
futurepackage when running in the official TensorFlow Docker image (a small but real friction point) - No mention of TensorFlow version compatibility; given the 2015–2016 era, this likely targets TF 0.x–1.x
Verdict
Worth a look if you’re studying memory networks or need a readable, self-contained TensorFlow implementation of the paper’s language modeling variant. Skip it if you need production-grade NLP or modern transformer-based approaches; this is educational code from an earlier era of deep learning.