runpod-workers/worker-vllm
A serverless worker template for deploying OpenAI-compatible vLLM-powered LLM inference endpoints on RunPod.

Velocity · 7d
+0.4
★ / day
Trend
→steady
star history
This repository provides a RunPod worker template for serving large language model endpoints using the vLLM inference engine. It enables deploying blazing-fast, OpenAI-compatible LLM endpoints on RunPod serverless infrastructure with minimal configuration. The template supports arbitrary model architectures compatible with vLLM and exposes OpenAI-style chat completions and completions APIs.