Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

graph: backend: dnnl: support permute for scale and zps inputs #2291

Open
wants to merge 4 commits into
base: main
Choose a base branch
from

Conversation

wzt1997
Copy link
Contributor

@wzt1997 wzt1997 commented Dec 19, 2024

Description

We recently supported the compressed SDPA patterns which incorporate scales and zero points as inputs. Also, we aim to support the problem with a Key shape of ( N, H, S, D ) with transpose_b=true for the QK MatMul. For such case with transposed MatMul, the graph library adds permute ops prior to the pattern input, including the scale and zp tensors. To be more specific, for the following woq MatMul with tranpose_b=true:
image

After the graph compilation, we will have the graph as follows. The permute ops will not touch the physical memories but only change the mds.
image

However, we faced a issue that DNNL backend always regards scale and zp tensors as abx format, leading to failing results, hence we need to set the tag explicitly as abx and execute extra reorders. After the change, the graph after compilation is like:
image

@wzt1997 wzt1997 added the component:graph-api Codeowner: @oneapi-src/onednn-graph label Dec 19, 2024
@wzt1997 wzt1997 self-assigned this Dec 19, 2024
@wzt1997 wzt1997 requested review from a team as code owners December 19, 2024 09:03
@github-actions github-actions bot added the component:tests Codeowner: @oneapi-src/onednn-arch label Dec 19, 2024
@wzt1997 wzt1997 force-pushed the zhitao/support-scale-permute branch from 953b5a6 to ce87611 Compare December 19, 2024 09:13
@wzt1997 wzt1997 force-pushed the zhitao/support-scale-permute branch 2 times, most recently from cb5d6c4 to 3a713a8 Compare December 20, 2024 01:33
@wzt1997
Copy link
Contributor Author

wzt1997 commented Dec 22, 2024

make test
enable benchdnn_nightly
disable benchdnn_all
enable benchdnn_graph

@wzt1997 wzt1997 force-pushed the zhitao/support-scale-permute branch from 3a713a8 to 1b80646 Compare December 23, 2024 07:13
@wzt1997
Copy link
Contributor Author

wzt1997 commented Dec 24, 2024

make test
enable benchdnn_nightly
disable benchdnn_all
enable benchdnn_graph

@wzt1997 wzt1997 force-pushed the zhitao/support-scale-permute branch from 1b80646 to b1eed4d Compare December 25, 2024 08:53
@wzt1997
Copy link
Contributor Author

wzt1997 commented Dec 26, 2024

make test
enable benchdnn_nightly
disable benchdnn_all
enable benchdnn_graph

@wzt1997 wzt1997 force-pushed the zhitao/support-scale-permute branch 3 times, most recently from 7d7be8a to 30f2fd8 Compare December 26, 2024 11:34
@wzt1997
Copy link
Contributor Author

wzt1997 commented Dec 26, 2024

During the validation of dynamic per-channel quantization cases, especially for those with zps and transposed matmul, some issues were discovered. So I pushed 2 new commits for enhancing the handling for dynamic per-channel quantization cases in the PR, please review again:

  1. Apart from per-group quantization and single reorder primitive, the backend always accepts mask=0 for zero points, hence improved the op def check for dynamic quantization ops to allow zp tensor of which nelems=1 for per-channel quantization.
  2. Added corresponding compressed SDPA case with per-channel quantization, and the transposed test case.

@wzt1997 wzt1997 force-pushed the zhitao/support-scale-permute branch 3 times, most recently from 95d50e6 to 825293a Compare December 27, 2024 01:35
@wzt1997
Copy link
Contributor Author

wzt1997 commented Dec 27, 2024

make test
enable benchdnn_nightly
disable benchdnn_all
enable benchdnn_graph

@wzt1997 wzt1997 force-pushed the zhitao/support-scale-permute branch from 825293a to 96d8cab Compare January 2, 2025 02:46
@wzt1997 wzt1997 force-pushed the zhitao/support-scale-permute branch from 96d8cab to 4611b49 Compare January 3, 2025 08:26
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
component:graph-api Codeowner: @oneapi-src/onednn-graph component:tests Codeowner: @oneapi-src/onednn-arch
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants