graph: backend: dnnl: support permute for scale and zps inputs #2291

wzt1997 · 2024-12-19T09:03:33Z

Description

We recently supported the compressed SDPA patterns which incorporate scales and zero points as inputs. Also, we aim to support the problem with a Key shape of ( N, H, S, D ) with transpose_b=true for the QK MatMul. For such case with transposed MatMul, the graph library adds permute ops prior to the pattern input, including the scale and zp tensors. To be more specific, for the following woq MatMul with tranpose_b=true:

After the graph compilation, we will have the graph as follows. The permute ops will not touch the physical memories but only change the mds.

However, we faced a issue that DNNL backend always regards scale and zp tensors as abx format, leading to failing results, hence we need to set the tag explicitly as abx and execute extra reorders. After the change, the graph after compilation is like:

tests/benchdnn/inputs/graph/complex_fusion/mha/sdpa-compressed-k-int8-gs32.json

src/graph/backend/dnnl/subgraph.cpp

wzt1997 · 2024-12-22T09:47:32Z

make test
enable benchdnn_nightly
disable benchdnn_all
enable benchdnn_graph

wzt1997 · 2024-12-24T01:22:05Z

make test
enable benchdnn_nightly
disable benchdnn_all
enable benchdnn_graph

src/graph/backend/dnnl/subgraph.cpp

wzt1997 · 2024-12-26T06:09:18Z

make test
enable benchdnn_nightly
disable benchdnn_all
enable benchdnn_graph

wzt1997 · 2024-12-26T11:39:13Z

During the validation of dynamic per-channel quantization cases, especially for those with zps and transposed matmul, some issues were discovered. So I pushed 2 new commits for enhancing the handling for dynamic per-channel quantization cases in the PR, please review again:

Apart from per-group quantization and single reorder primitive, the backend always accepts mask=0 for zero points, hence improved the op def check for dynamic quantization ops to allow zp tensor of which nelems=1 for per-channel quantization.
Added corresponding compressed SDPA case with per-channel quantization, and the transposed test case.

wzt1997 · 2024-12-27T02:49:37Z

make test
enable benchdnn_nightly
disable benchdnn_all
enable benchdnn_graph

wzt1997 added the component:graph-api Codeowner: @oneapi-src/onednn-graph label Dec 19, 2024

wzt1997 self-assigned this Dec 19, 2024

wzt1997 requested review from a team as code owners December 19, 2024 09:03

github-actions bot added the component:tests Codeowner: @oneapi-src/onednn-arch label Dec 19, 2024

wzt1997 requested review from xiang1guo and gyhintel December 19, 2024 09:07

wzt1997 force-pushed the zhitao/support-scale-permute branch from 953b5a6 to ce87611 Compare December 19, 2024 09:13

TaoLv reviewed Dec 19, 2024

View reviewed changes

tests/benchdnn/inputs/graph/complex_fusion/mha/sdpa-compressed-k-int8-gs32.json Outdated Show resolved Hide resolved

gyhintel reviewed Dec 19, 2024

View reviewed changes

src/graph/backend/dnnl/subgraph.cpp Outdated Show resolved Hide resolved

wzt1997 force-pushed the zhitao/support-scale-permute branch 2 times, most recently from cb5d6c4 to 3a713a8 Compare December 20, 2024 01:33

wzt1997 force-pushed the zhitao/support-scale-permute branch from 3a713a8 to 1b80646 Compare December 23, 2024 07:13

rongzha1 reviewed Dec 24, 2024

View reviewed changes

src/graph/backend/dnnl/subgraph.cpp Show resolved Hide resolved

xiang1guo reviewed Dec 25, 2024

View reviewed changes

src/graph/backend/dnnl/subgraph.cpp Outdated Show resolved Hide resolved

xiang1guo approved these changes Dec 25, 2024

View reviewed changes

wzt1997 force-pushed the zhitao/support-scale-permute branch from 1b80646 to b1eed4d Compare December 25, 2024 08:53

rongzha1 approved these changes Dec 26, 2024

View reviewed changes

gyhintel approved these changes Dec 26, 2024

View reviewed changes

wzt1997 force-pushed the zhitao/support-scale-permute branch 3 times, most recently from 7d7be8a to 30f2fd8 Compare December 26, 2024 11:34

wzt1997 force-pushed the zhitao/support-scale-permute branch 3 times, most recently from 95d50e6 to 825293a Compare December 27, 2024 01:35

gyhintel approved these changes Dec 31, 2024

View reviewed changes

wzt1997 force-pushed the zhitao/support-scale-permute branch from 825293a to 96d8cab Compare January 2, 2025 02:46

wzt1997 added 4 commits January 3, 2025 08:22

graph: backend: dnnl: support permute for scale and zps

71be38d

benchdnn: inputs: graph: add transposed case for compressed sdpa

3111918

graph: backend: dnnl: fix shape check for per-channel dynamic quant

e2ea1e2

benchdnn: inputs: graph: add compressed sdpa with per-channel quant

4611b49

wzt1997 force-pushed the zhitao/support-scale-permute branch from 96d8cab to 4611b49 Compare January 3, 2025 08:26

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

graph: backend: dnnl: support permute for scale and zps inputs #2291

graph: backend: dnnl: support permute for scale and zps inputs #2291

wzt1997 commented Dec 19, 2024 •

edited

Loading

wzt1997 commented Dec 22, 2024

wzt1997 commented Dec 24, 2024

wzt1997 commented Dec 26, 2024

wzt1997 commented Dec 26, 2024 •

edited

Loading

wzt1997 commented Dec 27, 2024

graph: backend: dnnl: support permute for scale and zps inputs #2291

Are you sure you want to change the base?

graph: backend: dnnl: support permute for scale and zps inputs #2291

Conversation

wzt1997 commented Dec 19, 2024 • edited Loading

Description

wzt1997 commented Dec 22, 2024

wzt1997 commented Dec 24, 2024

wzt1997 commented Dec 26, 2024

wzt1997 commented Dec 26, 2024 • edited Loading

wzt1997 commented Dec 27, 2024

wzt1997 commented Dec 19, 2024 •

edited

Loading

wzt1997 commented Dec 26, 2024 •

edited

Loading