Is sha-rnn open source?

Yes — Smerity/sha-rnn is an open-source project tracked on heatdrop.

What language is sha-rnn written in?

Smerity/sha-rnn is primarily written in Python.

How popular is sha-rnn?

Smerity/sha-rnn has 1.2k stars on GitHub.

Where can I find sha-rnn?

Smerity/sha-rnn is on GitHub at https://github.com/Smerity/sha-rnn.

← all repositories

Smerity/sha-rnn

One attention head, one GPU, 24 hours: near-Transformer results

A research project showing you don't need multi-head attention or TPU pods to get competitive language modeling.

★1.2k stars Python Language Models

View on GitHub ↗

Not currently ranked — collecting fresh signals.

star history

What it does

SHA-RNN bolts a single attention mechanism onto a four-layer LSTM and trains on byte-level text (enwik8). The goal is to match Transformer-class results without the Transformer-class hardware bill or training fragility. It hits ~1.07 BPC on enwik8 in under a day on a single 12GB Titan V.

The interesting bit

The paper’s title is a pun and the architecture is a provocation: one attention head, placed only in the second-to-last layer by default, is enough to capture 5,000-token dependencies. The model can also shed its attention state and fall back to a plain LSTM if memory gets tight — a graceful degradation path Transformers don’t offer.

Key highlights

Trains in ~30 minutes per epoch on a Titan V; full run under 24 hours
Supports 5,000-token context without the compute/memory explosion of full self-attention
Avoids Transformer training rituals: no long warmup, no hyper-sensitive hyperparameter grid
Built from standard PyTorch parts (LSTM, linear layers) — ONNX-exportable, no custom kernels
Within striking distance of Transformer-XL (1.07 vs. 1.06 BPC) with fewer parameters than the 18-layer variant

Caveats

The code is “not kind”: no CLI flags for model variants, manual edits to model.py required
Author notes active bug-shaking; near-replication achieved but discrepancies remain
Still ~0.09 BPC off true SOTA; framed explicitly as efficiency play, not accuracy crown

Verdict

Worth a look if you’re productionizing language models on commodity GPUs or skeptical that every problem needs a 175B-parameter Transformer. Skip if you need plug-and-play code or are chasing leaderboard-topping BPC at any cost.

Frequently asked

What is Smerity/sha-rnn?: A research project showing you don't need multi-head attention or TPU pods to get competitive language modeling.
Is sha-rnn open source?: Yes — Smerity/sha-rnn is an open-source project tracked on heatdrop.
What language is sha-rnn written in?: Smerity/sha-rnn is primarily written in Python.
How popular is sha-rnn?: Smerity/sha-rnn has 1.2k stars on GitHub.
Where can I find sha-rnn?: Smerity/sha-rnn is on GitHub at https://github.com/Smerity/sha-rnn.