tile-ai/TileRT
A tile-based inference runtime for serving large language models with ultra-low latency.

Velocity · 7d
+5.1
★ / day
Trend
→steady
star history
TileRT is a specialized runtime system designed to serve large language models with ultra-low latency. It implements a tile-based execution strategy to optimize inference performance across different model architectures. The project supports multiple models including DeepSeek-V3.2 and GLM-5, achieving throughput of hundreds of tokens per second in production environments. It is available as a PyPI package and has been deployed commercially on Z.ai.