← all repositories

dphnAI/aphrodite-engine

A production-grade LLM inference engine for serving transformer models at scale with PagedAttention and extensive quantization support.

aphrodite-engine
Velocity · 7d
+1.6
★ / day
Trend
steady
star history

Aphrodite is a large-scale LLM inference engine built on vLLM’s Paged Attention technology, providing high-throughput model serving for concurrent users. It supports distributed inference across multiple hardware backends including CUDA, ROCm, Intel, and TPU, and offers extensive quantization methods including AWQ, GPTQ, GGUF, Bitsandbytes, and more for efficient KV cache management and reduced memory footprint.

heatdrop uses Google Analytics to see which pages get read — nothing else. Your call. How we handle data.