← all repositories

uccl-project/uccl

A high-performance GPU communication library providing collectives, P2P transfers, and EP primitives optimized for distributed ML training and LLM inference.

1.4k stars C++ Other AI
uccl
Velocity · 7d
+2.7
★ / day
Trend
steady
star history

UCCL provides efficient GPU-to-GPU communication primitives including all-reduce, P2P transfers for KV cache and RL weight synchronization, and endpoint operations. It operates as a drop-in replacement for NCCL/RCCL requiring no application code changes, significantly outperforming them in both latency and throughput. The library focuses on flexibility for fast-evolving ML workloads and portability across heterogeneous GPU environments including NVIDIA and AMD hardware.

heatdrop uses Google Analytics to see which pages get read — nothing else. Your call. How we handle data.