alibaba/rtp-llm
Alibaba's GPU-accelerated LLM inference engine for production serving at scale.

Velocity · 7d
+1.3
★ / day
Trend
→steady
star history
RTP-LLM is a Large Language Model inference acceleration engine developed by Alibaba’s Foundation Model Inference Team. It provides high-performance serving of LLMs with optimizations including Prefill/Decode separation, GPU memory management, and attention kernel optimizations in CUDA. The engine supports distributed inference across multiple business units and is designed for multi-hardware deployments including AMD ROCm, Intel CPU, and ARM CPU backends.