← all repositories

runpod-workers/worker-vllm

A serverless worker template for deploying OpenAI-compatible vLLM-powered LLM inference endpoints on RunPod.

worker-vllm
Velocity · 7d
+0.4
★ / day
Trend
steady
star history

This repository provides a RunPod worker template for serving large language model endpoints using the vLLM inference engine. It enables deploying blazing-fast, OpenAI-compatible LLM endpoints on RunPod serverless infrastructure with minimal configuration. The template supports arbitrary model architectures compatible with vLLM and exposes OpenAI-style chat completions and completions APIs.

heatdrop uses Google Analytics to see which pages get read — nothing else. Your call. How we handle data.