← all repositories

tensorgi/TPA

A transformer-based foundation model architecture using Tensor Product Attention mechanisms for improved efficiency and reduced KV cache, published at NeurIPS 2025.

457 stars Python Language ModelsML Frameworks
TPA
Velocity · 7d
+0.9
★ / day
Trend
steady
star history

TPA (Tensor Product Attention Transformer) is a state-of-the-art language model architecture that replaces standard attention with Tensor Product Attention to enhance performance and reduce memory footprint. The repository provides official implementations for data preparation, model pretraining, and evaluation across different model scales. It also includes FlashTPA optimized decoding and prefilling kernels for efficient inference.

heatdrop uses Google Analytics to see which pages get read — nothing else. Your call. How we handle data.