Skip to content

Staging#46

Open
MankyDanky wants to merge 3 commits intomainfrom
staging
Open

Staging#46
MankyDanky wants to merge 3 commits intomainfrom
staging

Conversation

@MankyDanky
Copy link
Copy Markdown
Collaborator

No description provided.

MankyDanky and others added 3 commits April 16, 2026 22:06
flashAttention was hardcoded to assume 4D [B, H, T, D] input, but
models commonly pass 3D [B*H, T, D] after merging batch and head dims.
With 3D input, qShape[3] was undefined, causing NaN to propagate
through the entire forward pass.

Now reads T and D from the last two dimensions regardless of rank.
fix: handle 3D input tensors in flashAttention
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant