← all repositories

PaddlePaddle/FastDeploy

High-performance inference and deployment toolkit for LLMs and VLMs built on the PaddlePaddle framework.

FastDeploy
Velocity · 7d
+2.6
★ / day
Trend
steady
star history

FastDeploy is a production-focused deployment suite for large language models and vision-language models. It provides quantized model serving, OpenAI-compatible API endpoints, and support for popular models including ERNIE, Qwen3-VL, and their variants. The toolkit targets GPU-based inference optimization with techniques like W4AFP8 quantization and integrates with vLLM for serving workflows.

heatdrop uses Google Analytics to see which pages get read — nothing else. Your call. How we handle data.