← all repositories
stanfordnlp/mac-network

A neural network that thinks in steps, not leaps

Stanford's MAC cell breaks visual reasoning into explicit, inspectable computation steps—rare honesty in a field that usually hides its work.

mac-network
Velocity · 7d
+0.2
★ / day
Trend
steady
star history

What it does

MAC (Compositional Attention Networks) answers complex visual questions by chaining together small reasoning steps. Each “MAC cell” reads the question, looks at the image, and updates a working memory—repeat 4–16 times, then answer. The implementation covers both the synthetic CLEVR benchmark and the newer, messier GQA dataset of real-world scenes.

The interesting bit

The attention maps are genuinely inspectable. The authors explicitly note that shorter networks (4–8 cells) produce more interpretable reasoning traces, while deeper ones trade transparency for accuracy. That’s an unusually frank admission that interpretability and performance pull in opposite directions.

Key highlights

  • Fully differentiable multi-step reasoning with explicit memory and attention
  • Supports both CLEVR (synthetic shapes) and GQA (real-world visual reasoning)
  • Multiple model variants included: non-recurrent control, self-attention writes, memory gating
  • TensorFlow 1.x codebase; assumes ~12GB GPU memory
  • Visualization tools for attention maps, with ImageMagick polish tips included

Caveats

  • TensorFlow 1.3-era code; “should work for later versions” is doing some lifting
  • Several config options noted as “still in an experimental stage”
  • GQA adaptations live on a separate branch, not mainline

Verdict

Worth studying if you’re building or evaluating neuro-symbolic architectures, or if you need a concrete baseline for compositional reasoning. Skip if you want production-ready modern PyTorch—this is research code from 2018, preserved more than maintained.

heatdrop uses Google Analytics to see which pages get read — nothing else. Your call. How we handle data.