NVIDIA/warp
NVIDIA Warp is a Python framework that JIT-compiles functions to GPU-accelerated kernels with automatic differentiation for ML pipelines.

Warp provides a Python-first interface for writing GPU-accelerated code that runs on CPU or GPU. It uses just-in-time compilation to transform regular Python functions into efficient CUDA kernels. The framework includes primitives for physics simulation, robotics, and geometry processing. Crucially, all Warp kernels are automatically differentiable, enabling their use as building blocks in machine-learning training pipelines with frameworks such as PyTorch, JAX, and Paddle.