← all repositories

jd-opensource/xllm

A C++ inference engine for running LLMs, VLMs, diffusion transformers, and recommendation models with optimizations for diverse AI accelerators.

xllm
Velocity · 7d
+4.4
★ / day
Trend
steady
star history

This is an open-source high-performance inference engine developed by JD.com for serving large language models, vision-language models, diffusion transformers, and recommendation models. It provides day-0 support for popular model families including DeepSeek-V4, GLM-5, and Qwen, with optimizations targeting various AI hardware accelerators. The project includes comprehensive documentation, Docker images, and a published technical research paper.

heatdrop uses Google Analytics to see which pages get read — nothing else. Your call. How we handle data.