CUDA primitives for ML: the plumbing behind the speed
RAFT is the RAPIDS project's collection of GPU-accelerated building blocks that higher-level libraries actually use.

What it does
RAFT is a C++ header-only template library of CUDA-accelerated primitives for machine learning and information retrieval — linear algebra, distance computations, clustering, sparse operations, random sampling, and multi-GPU communication tools. It also ships pylibraft for lightweight Python access and raft-dask for distributed GPU algorithms. Think of it as the foundation that higher-level RAPIDS libraries stand on, not an end-user data science tool.
The interesting bit
The design is deliberately boring in the right way: by centralizing common GPU computations, optimizations propagate downstream automatically. RAFT uses mdspan/mdarray for multi-dimensional array views (Numpy-style semantics for C++), and its Python wrappers speak __cuda_array_interface__ — so output converts zero-copy to CuPy, PyTorch, JAX, TensorFlow, or cuDF. You can even set a global config to make all compute APIs return your preferred array type by default.
Key highlights
- Header-only C++ with optional pre-compiled shared library to cut compile times
pylibraftruntime APIs don’t require a CUDA compiler;raft-daskhandles multi-node multi-GPU via Dask- Heavy reuse of RAPIDS Memory Manager (RMM) for allocation strategy consistency
- Supports in-place output to any
__cuda_array_interface__array, including pre-allocated buffers - Conda and pip installable, though pip packages are experimental and C++ headers aren’t included there
Caveats
- Explicitly not for data scientists doing discovery/experimentation; the project says so itself
- The number of Python-exposed algorithms is “continuing to grow” — meaning not everything in C++ is wrapped yet
- Pip packages statically build instantiations, so C++ headers are unavailable; Conda is preferred
Verdict
Worth a look if you’re building GPU-accelerated ML infrastructure or contributing to RAPIDS-adjacent projects. Skip it if you just want ready-made scikit-learn replacements — the README literally tells you to go to the main RAPIDS site instead.