Skip to content

[xpu] xpu backend run pass#523

Open
yongqiangma wants to merge 3 commits intoPaddlePaddle:developfrom
yongqiangma:xpu_run
Open

[xpu] xpu backend run pass#523
yongqiangma wants to merge 3 commits intoPaddlePaddle:developfrom
yongqiangma:xpu_run

Conversation

@yongqiangma
Copy link
Contributor

@yongqiangma yongqiangma commented Feb 6, 2026

  • 添加XPU后端支持,验证导入正常python -c "import paddlefleet; print(f'paddlefleet {paddlefleet.version}')"
  • PaddleFormers中验证模型在依赖PaddleFleet情况下,可正常运行。

Copilot AI review requested due to automatic review settings February 6, 2026 01:17
@yongqiangma yongqiangma changed the title xpu backend run pass [xpu] xpu backend run pass Feb 6, 2026
Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This pull request introduces XPU backend compatibility support for PaddleFleet by refactoring CUDA-specific operations to support multiple backends. The PR adds a backend abstraction layer for fused SwiGLU scale operations and guards CUDA-specific code paths in the ops initialization.

Note on PR Metadata:

  • Title Format Issue: The PR title "xpu backend run pass" doesn't follow the required format [CLASS]Title. It should be something like "[Feature] XPU backend support" or "[BugFix] XPU backend compatibility".
  • Missing Description: The PR lacks a description explaining why these modifications are being made and what problem is being solved. The description should explain the motivation for adding XPU backend support and how the abstraction layer enables multi-backend compatibility.

Changes:

  • Created a new backend abstraction layer (fused_swiglu_scale.py) that conditionally routes to CUDA-specific implementations
  • Updated import paths from direct paddlefleet.ops imports to the new abstraction layer
  • Added CUDA-specific guards to prevent loading CUDA-only ecosystem libraries (deep_gemm, deep_ep, sonicmoe) on non-CUDA backends
  • Added initialization guard to prevent duplicate backend detection in backends.py
  • Added fallback for dependency resolution failures in setup.py

Reviewed changes

Copilot reviewed 7 out of 7 changed files in this pull request and generated 7 comments.

Show a summary per file
File Description
src/paddlefleet/fusions/fused_swiglu_scale.py New backend abstraction layer providing forward/backward functions with CUDA detection
src/paddlefleet/transformer/moe/fp8_utils.py Updated imports to use new abstraction layer functions instead of direct ops imports
tests/single_card_tests/custom_ops/test_fuse_swiglu_scale.py Updated test imports to use new abstraction layer
src/paddlefleet/ops/__init__.py Guarded CUDA-specific ecosystem library loading with backend detection
src/paddlefleet/fusions/fused_bias_swiglu.py Added CUDA backend check before importing fused_swiglu_bwd
setup.py Added exception handling for dependency resolution failures
backends.py Added initialization guard and auto-initialization at module level

setup.py Outdated
Comment on lines 194 to 199
except Exception:
# Fallback if dependency resolution fails
dependencies = common_dependencies
logging.warning(
"Failed to resolve special dependencies, using common dependencies only"
)
Copy link

Copilot AI Feb 6, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The broad Exception catch could silently hide important errors during dependency resolution. Consider catching more specific exceptions (e.g., ImportError, ModuleNotFoundError) or at minimum logging the actual exception details to help diagnose issues. For example: logging.warning(f"Failed to resolve special dependencies: {e}, using common dependencies only")

Copilot uses AI. Check for mistakes.
Comment on lines +18 to +22
def fused_swiglu_scale_forward(x, scale):
if paddle.is_compiled_with_cuda():
from paddlefleet.ops import fused_swiglu_scale

return fused_swiglu_scale(x, scale)
Copy link

Copilot AI Feb 6, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Consider adding validation or error handling for the case where the import succeeds but the actual CUDA operation fails at runtime. For robustness, you might want to wrap the call in a try-except block and provide a meaningful error message if the CUDA operation encounters issues.

Copilot uses AI. Check for mistakes.
Comment on lines 18 to 33
def fused_swiglu_scale_forward(x, scale):
if paddle.is_compiled_with_cuda():
from paddlefleet.ops import fused_swiglu_scale

return fused_swiglu_scale(x, scale)
else:
raise NotImplementedError


def fused_swiglu_scale_backward(x, scale, out_grad):
if paddle.is_compiled_with_cuda():
from paddlefleet.ops import fused_swiglu_scale_bwd

return fused_swiglu_scale_bwd(x, scale, out_grad)
else:
raise NotImplementedError
Copy link

Copilot AI Feb 6, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These functions lack docstrings describing their purpose, parameters, and return values. Following the pattern in other fusion files (e.g., fused_bias_swiglu.py lines 32-42, 71-81), add comprehensive docstrings that describe the SwiGLU scale operation, input parameters (x, scale), and return value to help users understand what these functions do.

Copilot uses AI. Check for mistakes.

return fused_swiglu_scale(x, scale)
else:
raise NotImplementedError
Copy link

Copilot AI Feb 6, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The NotImplementedError should include a descriptive error message to help users understand what backend is not supported and what alternatives they might have. For example: "fused_swiglu_scale_forward is only supported on CUDA backend. Current backend does not support this operation."

Copilot uses AI. Check for mistakes.

return fused_swiglu_scale_bwd(x, scale, out_grad)
else:
raise NotImplementedError
Copy link

Copilot AI Feb 6, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The NotImplementedError should include a descriptive error message to help users understand what backend is not supported and what alternatives they might have. For example: "fused_swiglu_scale_backward is only supported on CUDA backend. Current backend does not support this operation."

Copilot uses AI. Check for mistakes.
Comment on lines 82 to 89
if paddle.is_compiled_with_cuda():
from paddlefleet.ops import fused_swiglu_bwd

return fused_swiglu_bwd(g, y)
else:
logger.error(
"\033[91m fused_swiglu_bwd is not implemented for this backend! \033[0m"
)
Copy link

Copilot AI Feb 6, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The function logs an error but doesn't raise an exception or return a fallback value. This could lead to silent failures or undefined behavior. Consider either raising a NotImplementedError with a clear message or providing a fallback implementation for non-CUDA backends. The current implementation is inconsistent with the pattern in fused_swiglu_scale.py which raises NotImplementedError.

Copilot uses AI. Check for mistakes.
)


init_backend_type()
Copy link

Copilot AI Feb 6, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Consider adding a module-level docstring or inline comment explaining the purpose of the _initialized flag and why automatic initialization is needed at module import time (line 80). This would help future maintainers understand why init_backend_type() is called at the module level.

Copilot uses AI. Check for mistakes.
@codecov-commenter
Copy link

Codecov Report

❌ Patch coverage is 74.57627% with 15 lines in your changes missing coverage. Please review.
⚠️ Please upload report for BASE (develop@8374690). Learn more about missing BASE report.

Files with missing lines Patch % Lines
src/paddlefleet/ops/__init__.py 76.31% 5 Missing and 4 partials ⚠️
src/paddlefleet/fusions/fused_swiglu_scale.py 63.63% 2 Missing and 2 partials ⚠️
src/paddlefleet/fusions/fused_bias_swiglu.py 66.66% 1 Missing and 1 partial ⚠️

❌ Your patch status has failed because the patch coverage (74.57%) is below the target coverage (90.00%). You can increase the patch coverage or adjust the target coverage.

Additional details and impacted files

Impacted file tree graph

@@            Coverage Diff             @@
##             develop     #523   +/-   ##
==========================================
  Coverage           ?   74.57%           
==========================================
  Files              ?        4           
  Lines              ?       59           
  Branches           ?        9           
==========================================
  Hits               ?       44           
  Misses             ?        8           
  Partials           ?        7           
Flag Coverage Δ
coverage_combine 74.57% <74.57%> (?)

Flags with carried forward coverage won't be shown. Click here to find out more.

Files with missing lines Coverage Δ
src/paddlefleet/transformer/moe/fp8_utils.py 100.00% <100.00%> (ø)
src/paddlefleet/fusions/fused_bias_swiglu.py 66.66% <66.66%> (ø)
src/paddlefleet/fusions/fused_swiglu_scale.py 63.63% <63.63%> (ø)
src/paddlefleet/ops/__init__.py 76.31% <76.31%> (ø)
🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

)


init_backend_type()
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

build_backend.py 里的 backends.init_backend_type() 是不是可以删一下了?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

之前是这样想的。 后来考虑上面加了只执行一次的逻辑,而且这个可能会在不同的阶段被引用,不一定一次init,就会全局生效。这样会更保险些,且一般不会在性能敏感的部分使用这个接口。

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

现在的话,只要 import backend 就一定会生效,不过倒也没啥影响

risemeup1
risemeup1 previously approved these changes Feb 9, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants