philipturner/metal-flash-attention
A Swift/Metal implementation of FlashAttention optimized for Apple Silicon GPUs.

Velocity · 7d
+0.6
★ / day
Trend
→steady
star history
Ports the FlashAttention algorithm to Apple silicon using Metal. Implements single-headed attention with JIT compilation, featuring an alternative backward pass that achieves full parallelization efficiency across both dimensions of the attention matrix. Uses intentional register spilling and optimized block dimensions (16-32 along parallelization, 80-128 along traversal) to overcome register pressure bottlenecks at large head dimensions like 256.