Is sparse_attention open source?

Yes — openai/sparse_attention is an open-source project tracked on heatdrop.

What language is sparse_attention written in?

openai/sparse_attention is primarily written in Python.

How popular is sparse_attention?

openai/sparse_attention has 1.6k stars on GitHub.

Where can I find sparse_attention?

openai/sparse_attention is on GitHub at https://github.com/openai/sparse_attention.

← all repositories

openai/sparse_attention

OpenAI's sparse attention kernels: still frozen in 2019

Fused CUDA kernels for attention patterns that skip most of the QK^T matrix, letting transformers stretch to longer sequences without melting your GPU.

★1.6k stars Python ML Frameworks Inference · Serving

View on GitHub ↗

Not currently ranked — collecting fresh signals.

star history

What it does

This is OpenAI’s reference implementation of the sparse attention primitives from their 2019 “Sparse Transformers” paper. It provides fused CUDA kernels that compute attention while respecting block-sparsity patterns in the QK^T matrix — meaning you define which chunks of the attention matrix to actually calculate, and the rest get skipped entirely. The repo includes standard dense attention (with the upper triangle already elided), plus “strided” and “fixed” sparse patterns, and a small recompute decorator for memory management.

The interesting bit

The sparsity isn’t just a mask applied after the fact — it’s wired into the kernel at the block level. You specify a 0/1 pattern on a grid of blocks, and those blocks simply aren’t computed or included in softmax. There’s also a callback mechanism for finer-grained masking within computed blocks. It’s attention as stencil operation, not attention as brute-force matrix multiply.

Key highlights

Fused kernels for QK^T with configurable block sparsity; block sizes of 8, 16, 32, 64 supported
“Strided” and “fixed” attention patterns from the Sparse Transformers paper implemented natively
Includes both a blocksparse path (requires Tensor Cores for fp16/smaller blocks) and a fallback TensorFlow path
Simple recompute=True decorator for gradient checkpointing-style memory savings
Requires OpenAI’s separate blocksparse package, which needs CUDA 10 + tensorflow-gpu or manual source build

Caveats

Archived and explicitly unmaintained — “code is provided as-is, no updates expected”
Depends on TensorFlow and CUDA 10 era tooling; the blocksparse dependency is itself a separate repo to wrangle
The “state-of-the-art” follow-up work from August 2020 lives in a different repository entirely

Verdict

Worth studying if you’re implementing sparse attention patterns from scratch and want to see how OpenAI structured the kernel interface — the block-sparsity abstraction is clean. Skip it if you need something that runs on modern PyTorch or CUDA 12 without archaeology; this is a research artifact, not a library.

Frequently asked

What is openai/sparse_attention?: Fused CUDA kernels for attention patterns that skip most of the QK^T matrix, letting transformers stretch to longer sequences without melting your GPU.
Is sparse_attention open source?: Yes — openai/sparse_attention is an open-source project tracked on heatdrop.
What language is sparse_attention written in?: openai/sparse_attention is primarily written in Python.
How popular is sparse_attention?: openai/sparse_attention has 1.6k stars on GitHub.
Where can I find sparse_attention?: openai/sparse_attention is on GitHub at https://github.com/openai/sparse_attention.