Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Logcumsumexp has different results between CPU and XPU on BF16/Complex64/Complex128 #1012

Open
LuFinch opened this issue Oct 23, 2024 · 0 comments · May be fixed by #931
Open

Logcumsumexp has different results between CPU and XPU on BF16/Complex64/Complex128 #1012

LuFinch opened this issue Oct 23, 2024 · 0 comments · May be fixed by #931

Comments

@LuFinch
Copy link
Contributor

LuFinch commented Oct 23, 2024

🐛 Describe the bug

  • BF16:
PYTORCH_TEST_WITH_SLOW=1 python test/xpu/extended/test_ops_xpu.py TestCommonXPU.test_compare_cpu_logcumsumexp_xpu_bfloat16

Mismatched elements: 2 / 125 (1.6%)
Greatest absolute difference: 0.03125 at index (1, 4, 2) (up to 0.001 allowed)
Greatest relative difference: 0.006072998046875 at index (2, 3, 1) (up to 0.001 allowed)

cpu output at  (1, 4, 2): tensor(6.1875, dtype=torch.bfloat16)
xpu output at  (1, 4, 2): tensor(6.1562, device='xpu:0', dtype=torch.bfloat16)
  • Complex128
PYTORCH_TEST_WITH_SLOW=1 python test/xpu/extended/test_ops_xpu.py TestCommonXPU.test_compare_cpu_logcumsumexp_xpu_complex128

Mismatched elements: 2 / 125 (1.6%)
Greatest absolute difference: 12.566370614359174 at index (3, 3, 0) (up to 0.001 allowed)
Greatest relative difference: 1.5103243157406059 at index (3, 4, 0) (up to 0.001 allowed)

cpu output at (3, 3, 0): tensor(7.4356+3.7336j, dtype=torch.complex128)
xpu output at (3, 3, 0): tensor(7.4356-8.8328j, device='xpu:0', dtype=torch.complex128)
  • Complex64
test_reductions_xpu.py::TestReductionsXPU::test_logcumsumexp_complex_xpu_complex64 

Mismatched elements: 1 / 3 (33.3%)
Greatest absolute difference: nan at index (2,) (up to 1e-05 allowed)
Greatest relative difference: nan at index (2,) (up to 1.3e-06 allowed)

input : [1e3 + 0j, 1e-18 + 1e4j, 1e2 + 1e-8j] 
cpu_output :  [1000.+0.j, 1000.+0.j, 1000.+0.j]
cuda_output : [1000.+0.j, 1000.+0.j, 1000.+0.j]
xpu_output : [1000.+0.j, 1000.+0.j, nan + nanj]

For complex64, I found that the nan issue in complex64 is caused by accumulated order: our xpu scan kernel would firstly reduce input[1], input[2], then reduce input[0], input[2] in this case. However, even cpu kernel will output [nan, nanj] when directly calculating logcumsumexp(input[1], input[2]).

Versions

Related PR: #931

@LuFinch LuFinch linked a pull request Oct 23, 2024 that will close this issue
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

1 participant