[wip] Support premul_sum #1948

Chao1Han · 2025-08-21T07:54:30Z

refer https://github.com/intel-innersource/libraries.performance.communication.oneccl/pull/3449
integrate premul_sum to xccl

Copilot

Pull Request Overview

This PR adds support for premultiplied sum operations (premul_sum) in the XCCL distributed communication backend. The implementation includes version checking to ensure compatibility with oneCCL >= 2021.17, which is required for this feature.

Adds XCCLPreMulSumSupplement struct to handle premul sum factors (both scalar and tensor)
Implements makeXCCLPreMulSum template function to create reduce operations with factors
Updates getXcclReduceOp function signature and adds PREMUL_SUM case handling
Adds comprehensive test coverage for premul sum operations in reduce and reduce_scatter scenarios

Reviewed Changes

Copilot reviewed 3 out of 3 changed files in this pull request and generated 2 comments.

File	Description
test/xpu/distributed/test_c10d_ops_xccl.py	Fixes typos in class names and adds test cases for premul_sum operations
src/xccl/ProcessGroupXCCL.hpp	Adds XCCLPreMulSumSupplement struct and makeXCCLPreMulSum template function
src/xccl/ProcessGroupXCCL.cpp	Implements premul_sum support with version checking and updates function signatures

_{Tip: Customize your code reviews with copilot-instructions.md. Create the file or learn how to get started.}

Copilot · 2025-08-21T07:55:17Z

src/xccl/ProcessGroupXCCL.cpp

+        case ccl::datatype::float32:
+          return unpackPreMulSum<float, ccl::datatype::float32>(reduceOp, comm);
+        case ccl::datatype::bfloat16:
+          return unpackPreMulSum<float, ccl::datatype::bfloat16>(


For bfloat16 data type, the template should use at::BFloat16 instead of float. Using float for bfloat16 data will cause type mismatch issues when accessing the tensor data.

Suggested change

return unpackPreMulSum<float, ccl::datatype::bfloat16>(

return unpackPreMulSum<at::BFloat16, ccl::datatype::bfloat16>(

Copilot · 2025-08-21T07:55:18Z

src/xccl/ProcessGroupXCCL.cpp

+      ccl::reduction_destroy(op_, *comm_);
+    }
+  }
+#endif // ENABLE_XCCL_PREMUL_SUM_SUPPORT


The destructor is only defined when ENABLE_XCCL_PREMUL_SUM_SUPPORT is defined, but the class is used regardless of this macro. This will cause linking errors when the macro is not defined. The destructor should be defined unconditionally with appropriate conditional logic inside.

Suggested change

#endif // ENABLE_XCCL_PREMUL_SUM_SUPPORT

~xcclRedOpRAII() {

#if defined(ENABLE_XCCL_PREMUL_SUM_SUPPORT)

if (premul_sum_) {

ccl::reduction_destroy(op_, *comm_);

}

#endif // ENABLE_XCCL_PREMUL_SUM_SUPPORT

}

no-ponomarev · 2025-09-01T15:24:16Z

Just curious, did you run some kind of validation with this PR?

Chao1Han · 2025-09-02T01:13:05Z

Just curious, did you run some kind of validation with this PR?

Yes, simple case in test/xpu/distributed/test_c10d_ops_xccl.py reduce_scatter works well, but reduce hang.

Chao1Han · 2025-09-02T01:15:15Z

Just curious, did you run some kind of validation with this PR?

Yes, simple case in test/xpu/distributed/test_c10d_ops_xccl.py reduce_scatter works well, but reduce hang.

And this commit rely on https://github.com/Chao1Han/pytorch/pull/23/files register.

no-ponomarev · 2025-09-03T12:28:08Z

reduce hang

we don't have a support for just "reduce" collective for all new features, only "allreduce, reduce_scatter".
I think reduce call with premul_sum will throw an exception

Chao1Han · 2025-09-05T05:36:37Z

@zhangxiaoli73 pls help review.

Chao1Han · 2025-09-05T05:39:19Z

test case refer https://github.com/pytorch/pytorch/blob/5da573c42c332bc68d4b7946c69f690a876d951a/test/distributed/test_c10d_ops_nccl.py#L344-L405
https://github.com/pytorch/pytorch/blob/5da573c42c332bc68d4b7946c69f690a876d951a/test/distributed/test_c10d_ops_nccl.py#L785-L892

zhangxiaoli73 · 2025-09-05T05:52:00Z

reduce hang

we don't have a support for just "reduce" collective for all new features, only "allreduce, reduce_scatter". I think reduce call with premul_sum will throw an exception

@no-ponomarev Could I know what's the limitation to support reduce? Please comment in internal JIRA to let others know only those two collectives of scale-up can be supported.

Support premul_sum

0d38985

Copilot AI review requested due to automatic review settings August 21, 2025 07:54

Copilot AI reviewed Aug 21, 2025

View reviewed changes

Merge branch 'main' into xccl/premulsum

ce6ed15

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[wip] Support premul_sum #1948

[wip] Support premul_sum #1948

Uh oh!

Chao1Han commented Aug 21, 2025 •

edited

Loading

Uh oh!

Copilot AI left a comment

Uh oh!

Copilot AI Aug 21, 2025

Uh oh!

Copilot AI Aug 21, 2025

Uh oh!

no-ponomarev commented Sep 1, 2025

Uh oh!

Chao1Han commented Sep 2, 2025

Uh oh!

Chao1Han commented Sep 2, 2025

Uh oh!

no-ponomarev commented Sep 3, 2025 •

edited

Loading

Uh oh!

Chao1Han commented Sep 5, 2025

Uh oh!

Chao1Han commented Sep 5, 2025

Uh oh!

zhangxiaoli73 commented Sep 5, 2025

Uh oh!

Uh oh!

	return unpackPreMulSum<float, ccl::datatype::bfloat16>(
	return unpackPreMulSum<at::BFloat16, ccl::datatype::bfloat16>(

[wip] Support premul_sum #1948

Are you sure you want to change the base?

[wip] Support premul_sum #1948

Uh oh!

Conversation

Chao1Han commented Aug 21, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull Request Overview

Reviewed Changes

Uh oh!

Copilot AI Aug 21, 2025

Choose a reason for hiding this comment

Uh oh!

Copilot AI Aug 21, 2025

Choose a reason for hiding this comment

Uh oh!

no-ponomarev commented Sep 1, 2025

Uh oh!

Chao1Han commented Sep 2, 2025

Uh oh!

Chao1Han commented Sep 2, 2025

Uh oh!

no-ponomarev commented Sep 3, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Chao1Han commented Sep 5, 2025

Uh oh!

Chao1Han commented Sep 5, 2025

Uh oh!

zhangxiaoli73 commented Sep 5, 2025

Uh oh!

Uh oh!

Chao1Han commented Aug 21, 2025 •

edited

Loading

no-ponomarev commented Sep 3, 2025 •

edited

Loading