Skip to content

Allow TMA inner reduction to test CI#5988

Draft
tbqh wants to merge 1 commit intomainfrom
tbqh/reduction_inner_tma_ci
Draft

Allow TMA inner reduction to test CI#5988
tbqh wants to merge 1 commit intomainfrom
tbqh/reduction_inner_tma_ci

Conversation

@tbqh
Copy link
Collaborator

@tbqh tbqh commented Feb 20, 2026

No description provided.

@tbqh
Copy link
Collaborator Author

tbqh commented Feb 20, 2026

!test

@github-actions
Copy link

github-actions bot commented Feb 20, 2026

Description

  • Remove TMA reduction option check to enable testing in CI

  • Allow TMA inner reduction to be used whenever mayUseTma() returns true

  • Simplify use_tma boolean assignment by removing && condition

Changes walkthrough

Relevant files
Tests
reduction.cpp
Remove TMA option check for CI testing                                     

csrc/scheduler/reduction.cpp

  • Removed isOptionEnabled(EnableOption::TmaReduction) check from use_tma
    assignment
  • Now use_tma depends only on mayUseTma(props) result
  • Enables TMA reduction for CI testing by removing option guard
  • +1/-2     

    PR Reviewer Guide

    Here are some key observations to aid the review process:

    🧪 No relevant tests
    🔒 No security concerns identified
    ⚡ Recommended focus areas for review
    Removed Option Guard

    The PR removes the isOptionEnabled(EnableOption::TmaReduction) check, which means TMA reduction will now be enabled whenever mayUseTma(props) returns true, regardless of configuration. This removes important runtime control over TMA reduction behavior and could lead to unintended performance regressions or correctness issues in production environments.

    bool use_tma = mayUseTma(props);

    Test failures

    • (Medium, 4) nvFuser contiguity.size() mismatch in test_multidevice::test_welford across multiple runners

      Test Name GB200 GB200 (dist.) H100 H100 (dist.) Source
      tests.python.multidevice.test_multidevice.test_welford
    • (Medium, 2) NVFuser internal assert on cpAsync Bulk in tests.python.direct.test_repro::test_shared_memory_usage

      Test Name GB200 Source
      tests.python.direct.test_repro.test_shared_memory_usage[nvfuser_direct_test=eager]
      tests.python.direct.test_repro.test_shared_memory_usage[nvfuser_direct_test=lru_cache]
    • (Medium, 2) NVFuser internal assertion failure in NVFuserTest.InnerReductionUnrollVectorization across multiple runners

      Test Name GB200 H100 Source
      NVFuserTest.InnerReductionUnrollVectorization Link

    Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

    Labels

    None yet

    Projects

    None yet

    Development

    Successfully merging this pull request may close these issues.

    1 participant