Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix #3607 #3619

Merged
merged 1 commit into from
Dec 19, 2024
Merged

Fix #3607 #3619

merged 1 commit into from
Dec 19, 2024

Conversation

naoyam
Copy link
Collaborator

@naoyam naoyam commented Dec 19, 2024

Fixes #3607.

In the fusion reported in #3607, there's a proposed fusion segment as shown below:

**Segmenter** Considering fusion:
T14_g_double[iS27{2}, rS28{24000}](Avg),
T15_l_double[iS29{2}, rS30{24000}](Var),
T16_l_nvfuser_index_t[iS31{2}, rS32{24000}](Count)
 = Welford ( T2_g_double[iS3{2}, iS4{24000}](Avg),
  allreduce = false )
d59 = (double)(24000);
d61 = double(1) * d59;
d65 = (double)(0);
d67 = d61 - d65;
d69 = (double)(0);
b71 = d67 >= d69;
d73 = (double)(0);
d75 = where(b71, d67, d73);
d80 = reciprocal(d75);
T17_l_double[iS33{2}]
   = T15_l_double[iS29{2}, rS30{24000}]
   * d80;
T23_l_double[iS44{2}, bS45{1}]
   = broadcast( T17_l_double[iS33{2}] )
T25_l_double[iS48{2}, bS49{1}]
   = T23_l_double[iS44{2}, bS45{1}]
   + double(1.0000000000000001e-05);
T26_g_double[iS50{2}, bS51{1}]
   = rsqrt(T25_l_double[iS48{2}, bS49{1}]);
T27_l_double[iS52{2}, bS53{1}]
   = Set( T26_g_double[iS50{2}, bS51{1}], cache_op=Streaming )
T28_g_double[iS54{2}, bS55{1 ex 24000}] = expand( T27_l_double[iS52{2}, bS53{1}], {2, 24000} )
T32_g_double[iS62{2}, iS63{24000}]
   = T28_g_double[iS54{2}, bS55{1 ex 24000}]
   * T13_g_double[iS25{2}, iS26{24000}];
T37_l_double[iS72{2}, iS73{24000}]
   = -T32_g_double[iS62{2}, iS63{24000}];
T38_g_double[iS74{2}, rS75{24000}]
   = reduction( T37_l_double[iS72{2}, iS73{24000}], op = add, initial value = double(0), allreduce = false )

When canScheduleCompileTime of the transpose scheduler is called, FindAllMappedDims is used with T32 as the reference. Since its innermost dimension is not connected with T15 due to the reduction, propagateSibling hits the assertion at https://github.com/NVIDIA/Fuser/blob/main/csrc/scheduler/utils.cpp#L1476.

This PR avoids the assertion by checking the existence of reduction first. We may also want to remove the assertion.

@naoyam
Copy link
Collaborator Author

naoyam commented Dec 19, 2024

!test

@@ -87,6 +81,12 @@ bool TransposeScheduler::canScheduleCompileTime(Fusion* fusion) {
return false;
}

if (!hasAtLeastTwoValidGroups(fusion)) {
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I moved this here because because it uses getInputsOutputsWithInnerDim, which uses FindAllMappedDims, which asserts the existence of mappings between sibling tensors.

https://github.com/NVIDIA/Fuser/blob/main/csrc/scheduler/utils.cpp#L1476

I'm not entirely sure why we should have an assertion there. If the from tensor has a mapping to nullptr, which is not asserted, but propagating that to a sibling is asserted. I think we can just remove the assertion, but not 100% sure.

Instead, we shouldn't do this level of analysis until the fusion is confirmed to consist of only pointwise ops with no reduction. This should avoid the issue and also would help reduce the segmentation cost.

@naoyam naoyam requested a review from zasdfgbnm December 19, 2024 04:45
@naoyam naoyam marked this pull request as ready for review December 19, 2024 04:45
@t-vi
Copy link
Contributor

t-vi commented Dec 19, 2024

Thank you for this, @naoyam !
I can confirm that this unblock Lightning-AI/lightning-thunder#1560 .

@naoyam naoyam merged commit 6fe1865 into main Dec 19, 2024
48 checks passed
@naoyam naoyam deleted the fix_transpose_scheduler branch December 19, 2024 16:08
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

INTERNAL_ASSERT: Unable to find mapped root/logical domain
3 participants