← all repositories

MuLabPKU/TransArch

Research framework for hardware-aware LLM architecture design and model migration with up to 11× inference speedups.

TransArch
Velocity · 7d
+0.9
★ / day
Trend
steady
star history

TransArch is a collection of research projects from Peking University that design hardware-friendly model architectures and migrate existing pre-trained LLMs into optimized forms with minimal performance loss. The projects target attention mechanism optimizations including cross-layer pruning (CLOVER), conversion to DeepSeek-MLA (TransMLA), tensor-parallel latent attention (TPLA), sparse attention indexing (HISA, MISA), and adaptive query attention (GQLA). Multiple projects have been published at top venues (ICML, NeurIPS Spotlight, ASPLOS) and demonstrate significant speedups on modern hardware including H100 and H200 GPUs.

heatdrop uses Google Analytics to see which pages get read — nothing else. Your call. How we handle data.