Fix flash attention for non-pow2 dims, large dims, and backward NaN by murrellb · Pull Request #7 · MurrellGroup/ONIONop.jl

murrellb · 2026-02-18T23:21:48Z

Support non-pow2 emb_dim via kernel-level zero-padding (padded_dim)
Add smaller groupsizes (8, 4) for large dims exceeding 48KB shmem
Clamp MMA tile sizes (TM ≤ BM, TN ≤ BN) to prevent OOB shmem access
Fix backward NaN: mask OOB K positions in softmax reconstruction to prevent exp(0-m_i)→Inf then Inf*0→NaN contaminating dQ
Guard preprocess inv(ls) against ls=0 division

- Support non-pow2 emb_dim via kernel-level zero-padding (padded_dim) - Add smaller groupsizes (8, 4) for large dims exceeding 48KB shmem - Clamp MMA tile sizes (TM ≤ BM, TN ≤ BN) to prevent OOB shmem access - Fix backward NaN: mask OOB K positions in softmax reconstruction to prevent exp(0-m_i)→Inf then Inf*0→NaN contaminating dQ - Guard preprocess inv(ls) against ls=0 division Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Comments

Fix flash attention for non-pow2 dims, large dims, and backward NaN#7

Fix flash attention for non-pow2 dims, large dims, and backward NaN#7
murrellb wants to merge 1 commit intomainfrom
fix-flash-attention-padding

murrellb commented Feb 18, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Comments

Conversation

murrellb commented Feb 18, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant