Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[GPU] Use scf.if for forall overhangs #19125

Open
wants to merge 1 commit into
base: main
Choose a base branch
from

Conversation

krzysz00
Copy link
Contributor

In cases where we can't determine if the number of workitems per workgroup evenly divides the set of items that's required for an scf.forall, the current code uses
scf.for %i = %id to %upperBound step %numWorkitems in order to make the last loop iteration only run on the expected fraction of wworkitems.

This commit enables using linearize (and a step-1 loop) in the main body of the for loop the forall is being lowered to by switching to a post-loop if statement instead.

Copy link
Contributor

@MaheshRavishankar MaheshRavishankar left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see what you are doing here... it sort of makes sense... but there is some nuance here that @qedawkins was explaining to me some time. I'd wait for his review.

Copy link
Contributor

@qedawkins qedawkins left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see what you're going for but this feels like it might be a premature optimization to me. This means every time we resolve a forall we're duplicating the full body of the loop. This makes sense for simple cases like copies, but is probably overkill for larger loops. In other words, I think the decision to do this is tied to whether we want to unroll the loop.

Before landing this change, I'd like at least some signal that it's better. We can try using the ONNX model suite for this, there are some instructions for how to run it here

This will give us a report comparing model performance for a few models + overall model support numbers that can hopefully give some kind of signal.

@krzysz00
Copy link
Contributor Author

Yeah, agreed that this is
a) Not critical-path of the other changes I'm making and
b) Something that needs to "get perf'd"

... I could probably skip the if thing and move to linearize disjoint in the perfect-division case as well, which I'd expect to be roughly NFC

@qedawkins
Copy link
Contributor

Yeah the use of linearize SGTM

@krzysz00
Copy link
Contributor Author

@qedawkins That page you linked 404s for me. Do I need to get added to something?

@qedawkins
Copy link
Contributor

Oh shoot, yeah, let me remove that. Let's follow up offline because someone else will need to add you.

@krzysz00 krzysz00 force-pushed the users/krzysz00/gpu-distribute-with-linearize branch 2 times, most recently from 5328767 to 5a8fa83 Compare November 21, 2024 20:54
@krzysz00 krzysz00 force-pushed the users/krzysz00/gpu-distribute-with-linearize branch from 5a8fa83 to 291f570 Compare November 26, 2024 19:24
Base automatically changed from users/krzysz00/gpu-distribute-with-linearize to main November 26, 2024 20:38
In cases where we can't determine if the number of workitems per
workgroup evenly divides the set of items that's required for an
scf.forall, the current code uses
`scf.for %i = %id to %upperBound step %numWorkitems`
in order to make the last loop iteration only run on the expected
fraction of wworkitems.

This commit enables using linearize (and a step-1 loop) in the main
body of the for loop the forall is being lowered to by switching to a
post-loop if statement instead.
@krzysz00 krzysz00 force-pushed the users/krzysz00/use-if-for-overhang branch from 94e406d to 9368746 Compare November 26, 2024 23:31
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants