rapidsai/raft

CUDA primitives for ML: the plumbing behind the speed

RAFT is the RAPIDS project's collection of GPU-accelerated building blocks that higher-level libraries actually use.

★1k stars Cuda ML Frameworks RAG · Search Inference · Serving

View on GitHub ↗ Homepage ↗

Velocity · 7d

+0.4

★ / day

Trend

→steady

star history

What it does

RAFT is a C++ header-only template library of CUDA-accelerated primitives for machine learning and information retrieval — linear algebra, distance computations, clustering, sparse operations, random sampling, and multi-GPU communication tools. It also ships pylibraft for lightweight Python access and raft-dask for distributed GPU algorithms. Think of it as the foundation that higher-level RAPIDS libraries stand on, not an end-user data science tool.

The interesting bit

The design is deliberately boring in the right way: by centralizing common GPU computations, optimizations propagate downstream automatically. RAFT uses mdspan/mdarray for multi-dimensional array views (Numpy-style semantics for C++), and its Python wrappers speak __cuda_array_interface__ — so output converts zero-copy to CuPy, PyTorch, JAX, TensorFlow, or cuDF. You can even set a global config to make all compute APIs return your preferred array type by default.

Key highlights

Header-only C++ with optional pre-compiled shared library to cut compile times
pylibraft runtime APIs don’t require a CUDA compiler; raft-dask handles multi-node multi-GPU via Dask
Heavy reuse of RAPIDS Memory Manager (RMM) for allocation strategy consistency
Supports in-place output to any __cuda_array_interface__ array, including pre-allocated buffers
Conda and pip installable, though pip packages are experimental and C++ headers aren’t included there

Caveats

Explicitly not for data scientists doing discovery/experimentation; the project says so itself
The number of Python-exposed algorithms is “continuing to grow” — meaning not everything in C++ is wrapped yet
Pip packages statically build instantiations, so C++ headers are unavailable; Conda is preferred

Verdict

Worth a look if you’re building GPU-accelerated ML infrastructure or contributing to RAPIDS-adjacent projects. Skip it if you just want ready-made scikit-learn replacements — the README literally tells you to go to the main RAPIDS site instead.