← all repositories

philipturner/metal-flash-attention

A Swift/Metal implementation of FlashAttention optimized for Apple Silicon GPUs.

metal-flash-attention
Velocity · 7d
+0.6
★ / day
Trend
steady
star history

Ports the FlashAttention algorithm to Apple silicon using Metal. Implements single-headed attention with JIT compilation, featuring an alternative backward pass that achieves full parallelization efficiency across both dimensions of the attention matrix. Uses intentional register spilling and optimized block dimensions (16-32 along parallelization, 80-128 along traversal) to overcome register pressure bottlenecks at large head dimensions like 256.

heatdrop uses Google Analytics to see which pages get read — nothing else. Your call. How we handle data.