Reverse-engineering Claude's rumored 'Mythos' architecture from research breadcrumbs
An open-source guess at how Anthropic might be doing silent, looped reasoning inside a single forward pass.

What it does
OpenMythos implements a Recurrent-Depth Transformer: a fixed set of layers run in a loop (Prelude → recurrent block → Coda) rather than stacked hundreds deep. You can swap between GQA and MLA attention, add sparse MoE feed-forward layers, and dial up n_loops at inference time to trade compute for “depth” of reasoning. It ships with pre-baked configs from 1B to 1T parameters and a training script for FineWeb-Edu.
The interesting bit The project treats looped transformers as a dynamical system and enforces stability by construction: the learned injection matrix A is parameterized so its spectral radius stays below 1, preventing the hidden state from exploding across loops. The README frames this as the “Parcae architecture” and speculates it matches how Anthropic solved the notorious instability of training looped models.
Key highlights
- Switchable attention: GQA with Flash Attention 2 fallback, or MLA (DeepSeek-V2 style) with compressed KV latents
- Sparse MoE with routed + shared experts, configurable per token
- Inference-time loop count is independent of parameter count — same weights, more “thinking” loops
- Training script included: DDP via
torchrun, bfloat16 on H100/A100, Chinchilla-adjusted 30B token target for the 3B model - Spectral radius check exposed in API:
model.recurrent.injection.get_A()so you can verify stability yourself
Caveats
- The connection to actual “Claude Mythos” is pure speculation; the README disclaims any Anthropic affiliation, and the cited papers (Saunshi et al. 2025, Prairie et al. 2026) appear to be future-dated or nonexistent
- No training results, loss curves, or downstream benchmarks are shown — it is unclear whether the implemented stability fix actually works in practice
- The 1T-parameter config table is aspirational; nothing suggests these scales have been trained or even instantiated
Verdict Worth a look if you are experimenting with looped transformers, compute-adaptive inference, or stability tricks for recurrent architectures. Skip it if you want a production model or verified reproduction of a real system.