MuLabPKU/TransArch
Research framework for hardware-aware LLM architecture design and model migration with up to 11× inference speedups.

TransArch is a collection of research projects from Peking University that design hardware-friendly model architectures and migrate existing pre-trained LLMs into optimized forms with minimal performance loss. The projects target attention mechanism optimizations including cross-layer pruning (CLOVER), conversion to DeepSeek-MLA (TransMLA), tensor-parallel latent attention (TPLA), sparse attention indexing (HISA, MISA), and adaptive query attention (GQLA). Multiple projects have been published at top venues (ICML, NeurIPS Spotlight, ASPLOS) and demonstrate significant speedups on modern hardware including H100 and H200 GPUs.