tensorgi/TPA
A transformer-based foundation model architecture using Tensor Product Attention mechanisms for improved efficiency and reduced KV cache, published at NeurIPS 2025.

TPA (Tensor Product Attention Transformer) is a state-of-the-art language model architecture that replaces standard attention with Tensor Product Attention to enhance performance and reduce memory footprint. The repository provides official implementations for data preparation, model pretraining, and evaluation across different model scales. It also includes FlashTPA optimized decoding and prefilling kernels for efficient inference.