Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Non-uniform processor allocation in domain-decomposed simulations #246

Draft
wants to merge 19 commits into
base: dev
Choose a base branch
from

Conversation

alexandermote
Copy link
Contributor

Opening a draft PR to get fresh eyes on my new DD code. We can now handle decomposed mesh tallies of varying sizes, and the dd_slab_reed test should pass when run with 4 processors. Currently working on getting dd_slab_reed and dd_cooper to pass when run with multiple processors per subdomain; from there, we should be able to add non-uniform work ratios fairly easily.

@alexandermote
Copy link
Contributor Author

Apparently I wasn't running on the most up-to-date version of dev; as a result, it looks like the DD code isn't working. I'll look into it and see if I can figure out where the issue is.

@alexandermote
Copy link
Contributor Author

The tally values I get from this version are identical to the ones I was getting on my branch; this made me wonder if the answer.h5 file had somehow changed. Running the dd_slab_reed problem without domain decomposition, and using that output as the answer.h5 for the regression test, causes the test to pass again. It's possible that the changes I made to the dd_slab_reed input caused its output to change from the existing version, but since the new version is accurate to a non-DD simulation, I believe it is the correct output.

@alexandermote
Copy link
Contributor Author

The dd_slab_reed test succeeds in Python and Numba modes on my Dane build; the tests fail on Github because it is trying to run with only 1 processor. Not sure why that's happening; my only guess is it's a difference between calling --mpiexec and --srun?

@ilhamv
Copy link
Member

ilhamv commented Jan 6, 2025

@alexandermote , I made some updates. Particularly, I changed how we distribute source particles locally with multiple processors per domain.

Let's use test/regressionslab_reed_dd for testing. It runs all right with 4 processors (1 processor per domain). It also seems to run OK with 8 processors (2 processors per domain); however, it does not pass the mesh tally merging at the end of the simulation, which is possibly the only issue left. I see that you already put some mechanics that treat the multiprocessors-per-node tally merging with MPI rank grouping, but it seems that it needs checking there as it is where the error is located (run with 8 processors to reproduce the error).

@alexandermote
Copy link
Contributor Author

@ilhamv
I believe I have fixed the tally merging issue. I ran dd_slab_reed with 4, 8, and 16 processors, and achieved identical results across all 3.
I have noticed that when I run dd_slab_reed without domain decomposition enabled, it now produces a different result than we get from the simulations with domain decomposition active. If you have time to see if you can reproduce that issue, I'd appreciate it. I will work on getting non-uniform processor allocation working next.

@ilhamv
Copy link
Member

ilhamv commented Jan 7, 2025

@alexandermote , the test slab_reed is the one without domain decomposition, and it passes with reproducibility. So, I think we are good with the multiples.

Do you plan to test and include the dd_cooper for multidimensional domain decomposition test?

@alexandermote
Copy link
Contributor Author

@ilhamv:
I get different results between slab_reed and dd_slab_reed; not sure why. I built a 3D version of Reed's problem that I used to verify 3D domain decomposition in my M&C paper. I'll add it to this repo in the next push.
I noticed a discrepancy when running with work_ratio active, which has led to me rewriting a couple of pieces of the code, including the local sourcing code you added. I'm working on making sure the standard deviation values are equivalent across multiple processor allocations, and then I will push the changes. This push will also add support for non-uniform processor allocation.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants