jd-opensource/xllm
A C++ inference engine for running LLMs, VLMs, diffusion transformers, and recommendation models with optimizations for diverse AI accelerators.

This is an open-source high-performance inference engine developed by JD.com for serving large language models, vision-language models, diffusion transformers, and recommendation models. It provides day-0 support for popular model families including DeepSeek-V4, GLM-5, and Qwen, with optimizations targeting various AI hardware accelerators. The project includes comprehensive documentation, Docker images, and a published technical research paper.