eugr/spark-vllm-docker
Docker setup for deploying vLLM LLM inference across single and multi-node NVIDIA DGX Spark GPU clusters.

Velocity · 7d
+8.0
★ / day
Trend
→steady
star history
This repository provides Docker configurations and startup scripts to run vLLM on NVIDIA DGX Spark hardware. It supports both single-node and multi-node cluster deployments using Ray or vLLM’s native PyTorch distributed mode. The setup includes InfiniBand/RDMA support for high-performance inter-GPU communication, custom environment configuration, and optimized model loading through fastsafetensors and InstantTensor.