jd-opensource/xllm

A C++ inference engine for running LLMs, VLMs, diffusion transformers, and recommendation models with optimizations for diverse AI accelerators.

★1.3k stars C++ Inference · Serving Language Models

View on GitHub ↗ Homepage ↗

Velocity · 7d

+4.4

★ / day

Trend

→steady

star history

This is an open-source high-performance inference engine developed by JD.com for serving large language models, vision-language models, diffusion transformers, and recommendation models. It provides day-0 support for popular model families including DeepSeek-V4, GLM-5, and Qwen, with optimizations targeting various AI hardware accelerators. The project includes comprehensive documentation, Docker images, and a published technical research paper.