ParallelFor GPU optimization #4857

dalarlla · 2025-12-16T22:18:19Z

dalarlla
Dec 16, 2025

I've been looking through some of the code for common amrex operations, average_down,crse_init,fine_add etc. and noticed that in some instances the parallelfor is inside and mfiter and other times the GPU branch launches using a single parallelFor that iterates over boxes.

At first glance, it seems reasonable that the second option would result in less kernel launches when there are many boxes and thus reduce kernel launch overhead which seems appealing. However, not all of the above operations use that parallelFor, fineAdd as an example.

So I have two questions, is there an algorithmic reason that the parallelfor over boxes isn't used everywhere for GPU? And would I expect to see a speedup on GPU in my code if I replaced my mfiter loops with the parallelfor over boxes in the case where there are many boxes?

Currently I have dropped the gridding efficiency on GPU builds to avoid too many boxes and launch overhead but this can result in large areas of refinement that aren't strictly needed for the problem.

AlexanderSinn · 2025-12-17T18:14:36Z

AlexanderSinn
Dec 17, 2025
Collaborator

In principle, the multi-box ParallelFor version has some additional overhead from needing to choose the correct box to use and setting up the data structure for that. I never benchmarked this in detail; however, I would expect the multi-box version to be better if there are a lot of tiny boxes and the normal version to be faster if there only is a single box. Maybe the main reason it is not used everywhere is historical, as in it was added later and not everything was converted. Technically, if there are multiple ParallelFors in an MFIter loop, the normal version could better take advantage of CPU/GPU cache due to the tiling done by MFIter. However, again, I am not sure if this effect is actually measurable.

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

ParallelFor GPU optimization #4857

Uh oh!

{{title}}

Uh oh!

Replies: 1 comment

Uh oh!

{{title}}

Uh oh!

Select a reply

Uh oh!

ParallelFor GPU optimization #4857

Uh oh!

dalarlla Dec 16, 2025

Replies: 1 comment

Uh oh!

AlexanderSinn Dec 17, 2025 Collaborator

dalarlla
Dec 16, 2025

AlexanderSinn
Dec 17, 2025
Collaborator