← all repositories

tile-ai/TileRT

A tile-based inference runtime for serving large language models with ultra-low latency.

TileRT
Velocity · 7d
+5.1
★ / day
Trend
steady
star history

TileRT is a specialized runtime system designed to serve large language models with ultra-low latency. It implements a tile-based execution strategy to optimize inference performance across different model architectures. The project supports multiple models including DeepSeek-V3.2 and GLM-5, achieving throughput of hundreds of tokens per second in production environments. It is available as a PyPI package and has been deployed commercially on Z.ai.

heatdrop uses Google Analytics to see which pages get read — nothing else. Your call. How we handle data.