Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

update scrip_remap_conservative to use MPI #1268

Open
wants to merge 12 commits into
base: develop
Choose a base branch
from

Conversation

jcwarner-usgs
Copy link

Pull Request Summary

When using nesting, the SCRIP routines are called to compute remapping weights. But each processor computes all the weights. Here we updated SCRIP/scrip_remap_conservative.F to use all the MPI processors to compute separate portions of the remap weights, and then distribute a full set of weights to all the nodes.

Description

The results should be identical with or without this update.
Reviewer could be: Erick Rogers, NRL

Issue(s) addressed

Related to Discussion #1252

Commit Message

Update scrip_remap_conservative to be more efficient on first use.

Check list

Testing

  • How were these changes tested? I ran several personal tests on local HPC, nothing too extensive.
  • Are the changes covered by regression tests? (If not, why? Do new tests need to be added?)
  • Have the matrix regression tests been run (if yes, please note HPC and compiler)?
  • Please indicate the expected changes in the regression test output, (Note the list of known non-identical tests.)
  • Please provide the summary output of matrix.comp (matrix.Diff.txt, matrixCompFull.txt and matrixCompSummary.txt):

@JessicaMeixner-NOAA
Copy link
Collaborator

@ErickRogers pinging you as you were suggested as a possible reviewer.

Copy link
Collaborator

@ErickRogers ErickRogers left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks safe to me, since it's put behind a switch. I'm not proficient with MPI programming, so I can't comment on that, but I expect that it is good, since John says it works. Let me know if you want an MPI-fluent reviewer.
NB: We have some rudimentary MPI in WMGHGH. However, it splits things up like "one process per remapping", e.g., if you have 2 grid-pairs, it only splits into 2 processes. So this "within grid" parallelism should be significantly faster.

@MatthewMasarik-NOAA
Copy link
Collaborator

Thanks for submitting @jcwarner-usgs. I'll start reviewing.

@MatthewMasarik-NOAA
Copy link
Collaborator

This looks safe to me, since it's put behind a switch. I'm not proficient with MPI programming, so I can't comment on that, but I expect that it is good, since John says it works. Let me know if you want an MPI-fluent reviewer. NB: We have some rudimentary MPI in WMGHGH. However, it splits things up like "one process per remapping", e.g., if you have 2 grid-pairs, it only splits into 2 processes. So this "within grid" parallelism should be significantly faster.

Thanks for reviewing @ErickRogers.

@jcwarner-usgs
Copy link
Author

I think the MPI could be re-written to be more WW3 centric. And my switches in there are not active.
If you want me to spend more time on this i can. I was not sure if WW3 people would re-write the code or not.

@MatthewMasarik-NOAA
Copy link
Collaborator

MatthewMasarik-NOAA commented Jul 10, 2024

I think the MPI could be re-written to be more WW3 centric. And my switches in there are not active. If you want me to spend more time on this i can. I was not sure if WW3 people would re-write the code or not.

Hi @jcwarner-usgs, thanks for this follow up. I had two thoughts on the switches you mentioned. Currently they are W3_SCRIP_JCW. This should be modified to remove your initials from the end, maybe W3_SCRIP_MPI or something similar? We should also have the switches be active and set a default value (I believe if MPI isn't enabled then we wouldn't want to enter those code blocks, so maybe we should have a default value of .false.?).

Regarding re-writing to be more WW3 centric. Being as WW3 centric as possible is obviously best for the code base, though I can say in my initial review nothing jumped out as being problematic in that regard. It seemed clean and well commented which is great. We do have a Code Style Standards page though that may help out here. I would currently just make the suggestion of renaming some of these My* variables here

https://github.com/jcwarner-usgs/WW3/blob/0baa3df08da8bfeb8be1fe1c13755e1a9938c46c/model/src/SCRIP/scrip_remap_conservative.f#L269-L270

to something more descriptive if possible.

@ErickRogers
Copy link
Collaborator

@jcwarner-usgs I'm not sure I understand what you mean re: WW3 centric. What was your motivation for adding this feature? For you to use in your WW3 applications? Or some other applications?

@jcwarner-usgs
Copy link
Author

oh. I just meant like I used:

  call mpi_allreduce(Asend, Arecv, grid1_size, MPI_DOUBLE,                 &
 &                   MPI_SUM, MyComm, MyError)

but in other parts of the WW3 the mpi calls look like:

        CALL MPI_ALLREDUCE(SND_BUFF(1:NSEA), &
             RCV_BUFF(1:NSEA), &
             NSEA,     &
             MPI_REAL, &
             MPI_SUM,  &
             ID_LCOMM, &
             IL_ERR)

should i go thru and make the variables all capitals?
should i use SND_BUF and RCV_BUF instead of Asend and Arecv?
The My Start (Mystr), My end (Myend), My Rank, My COMM etc are from ROMS.
I can give it a clean up if you want.

@ErickRogers
Copy link
Collaborator

@jcwarner-usgs OK, I see. Matt and Jessica may disagree, but from my point of view, it is fine to use the conventions of the original file, rather than the conventions of WW3. I think of the stuff in the /SCRIP/ subdirectory as not strictly part of the WW3 code proper, but rather as a semi-external library that is adapted (with as light a touch as possible) to work with WW3, to be compiled into WW3. We maintain it in the same version control with WW3 because maintaining it separately would just result in extra work. (2 cents)

@ErickRogers
Copy link
Collaborator

All that aside, renaming the My* variables wouldn't hurt, for clarity purposes.

@JessicaMeixner-NOAA
Copy link
Collaborator

JessicaMeixner-NOAA commented Jul 11, 2024

I agree that keeping styles matching to what is in the routine is okay. It's more of the using your initials or "my" variable names versus just making them generic/descriptive to what they are.

That's really great to hear from @aliabdolali that it sped things up, we'll have to see if that can help us in some set-ups here. I don't see any regression tests that are using these updates.

@jcwarner-usgs Can you tell me a little more about your tests that you've run on your end so we can help advise how to best put in a test for the regression tests so we ensure this feature continues to work? Also, did you obtain identical results with and without this update?

@JessicaMeixner-NOAA
Copy link
Collaborator

@jcwarner-usgs Can you tell me a little more about your tests that you've run on your end so we can help advise how to best put in a test for the regression tests so we ensure this feature continues to work? Also, did you obtain identical results with and without this update?

@jcwarner-usgs - I realized I wasn't clear that this question was to you so I updated my comment and am pinging you on this.

Copy link
Collaborator

@MatthewMasarik-NOAA MatthewMasarik-NOAA left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@jcwarner-usgs I wanted to touch base following the discussion on code style. I only have one request here, that is to rename the switch W3_SCRIP_JCW to something without your initials (maybe W3_SCRIP_MPI or other).

Also, I wanted to mention that a couple Github CI jobs failed. I'm re-running these manually now to see if they will resolve themselves.

@jcwarner-usgs
Copy link
Author

Thanks for checking back.
-I did make a small change today, for the size of the remap weights in the scrip_remap_conservative.f
-I will change the switch to something else. You already have SCRIP and SCRIPNC. So how about SCRIPMPI?

  • I am running some more tests here. We have a 4 nested set of grids that are challenging to get going.

@MatthewMasarik-NOAA
Copy link
Collaborator

-I will change the switch to something else. You already have SCRIP and SCRIPNC. So how about SCRIPMPI?

SCRIPMPI is fine with me.

@MatthewMasarik-NOAA
Copy link
Collaborator

Hi @jcwarner-usgs, just a note to say we are taking a short pause on our PR processing do to some temporary changes for our group. I just posted our group on the Commit Queue: Temporary PR Policy. Look forward to picking back up after our short break.

@jcwarner-usgs
Copy link
Author

Getting back onto this pull request now.
I have just updated 3 routines in my code:

  1. model/src/cmake/src_list.cmake
    changed
    ${CMAKE_CURRENT_SOURCE_DIR}/SCRIP/scrip_remap_conservative.f
    to
    ${CMAKE_CURRENT_SOURCE_DIR}/SCRIP/scrip_remap_conservative.F
    (notice the capital F extension)

  2. model/src/cmake/switches.json
    added:
    {
    "name": "scripmpi",
    "num_switches": "upto2",
    "description": "",
    "valid-options": [
    {
    "name": "SCRIPMPI"
    }
    ]
    },

  3. model/src/SCRIP/scrip_remap_conservative.F

  • changed the file extension from .f to .F This allows cpp to process this file.
  • put all my changes within #define SCRIPMPI blocks. I think we should have 3 options of SCRIP, SCRIPNC, and SCRIPMPI.
  • made use of MPI_COMM_WAVE, IAPROC, and NTPROC from w3adatmd and w3odatmd.
  • Removed
    Myrank, MyError, MyComm, MyStr and Myend and replaced with
    IAPROC, IERR_MPI, MPI_COMM_WAV, grid1/2_str, grid1/2_end
  • cleaned up the mpi send/recv calls.
  1. I tested these changes with a simple case we use of Hurricane Sandy with a nested grid. I ran the case with the original version of scrip_remap_conservative.f and the modified scrip_remap_conservative.F. For each case a new version of rmp_src_to_dst_conserv_002_001.nc was created. I compared the contents of these files by comparisons between src_address, dst_address, dst_grid_frac, and remap_matrix. The results were identical, so the scrip files being generated are the same. I also, looked at results of hs and dtp and they were identical. Happy to do any tests that you have.

Can you try this merge again?

@MatthewMasarik-NOAA
Copy link
Collaborator

Hi @jcwarner-usgs, thank you for addressing the previous comments with these updates, and including the description of the changes. Much appreciated.

A few weeks ago we posted a temporary PR Policy. It is in response to temporary resource reductions, where we don't have the manpower to review PRs, and so are limiting what we take on to just what is needed for the next GFS/GEFS releases being worked on. This is just till mid-November. I really apologize because I know you submitted this prior to that. We will pick this back up from our end just as soon as our temporary status ends sometime in November. We'll look forward to completing the review then. Thanks for understanding.

@jcwarner-usgs
Copy link
Author

All good! no worries.
I had time to make it cleaner and so i worked on it. We will keep using it and ensure it is working.
thanks
-j

@MatthewMasarik-NOAA
Copy link
Collaborator

All good! no worries. I had time to make it cleaner and so i worked on it. We will keep using it and ensure it is working. thanks -j

Awesome. thanks so much for understanding. having some more time to ensure it's working is a side benefit!

@MatthewMasarik-NOAA
Copy link
Collaborator

@jcwarner-usgs just to let you know we are back from our temporary PR policy, and I've started working on the review for this. I'll be back in touch again next week.

* mtm/scrip/mpi:
  fix: OMP do statement
  mv grid1/2 str/end integers outside of scripmpi
  update scrip_remap_conservative.F with SCRIPMPI, switches.json, and src_list.cmake
  update scrip_remap_conservative.F with SCRIPMPI, switches.json, and src_list.cmake
  update scrip_remap_conservative.F with SCRIPMPI, switches.json, and src_list.cmake
  change resize in scrip conserv
  update scrip remap conserv
@MatthewMasarik-NOAA
Copy link
Collaborator

Update
The review for this is underway, with some related OMP needed to be resolved. PR #1323 was submitted to resolve a typo in the ww3_ufs1.x regtests. Another small fix PR to this branch will be made early next week, then this PR should pass regtests b4b.

@MatthewMasarik-NOAA
Copy link
Collaborator

Hi @jcwarner-usgs. I'm about to submit a PR to this PR, which makes a slight fix to a couple of OMP sections. It will require you to review what I've submitted, then merge it if you are OK with it. After that I'll finish the review on this main PR. The PR I will submit only makes two re-ordering edits to the scrip_remap_conservative.F file, though currently you'll see changes in a few other files due to new commits, or automatic whitespace removal from the text editor.

To remove these extra changes before you merge you can sync your branch, or merging in the PR will bring those in with the PR. Since we will Squash Merge the main PR, I think either is fine. I'm happy to answer any questions on doing the merge if you like.

@MatthewMasarik-NOAA
Copy link
Collaborator

@jcwarner-usgs the PR has been submitted, jcwarner-usgs#1.

Fix OMP for scrip_remap_conservative MPI update
@MatthewMasarik-NOAA
Copy link
Collaborator

A quick update: the regtests all passed so everything looks good. I do want to do a few speed comparisons just for my own curiosity before officially approving it. I plan to be able to finish that and wrap the PR up later today or early tomorrow.

@MatthewMasarik-NOAA
Copy link
Collaborator

@jcwarner-usgs I ran into some build issues when trying to do the performance testing. I was doing this by adding 'SCRIPMPI' to a few existing regtests, and from my initial look at error log messages, I think there is something small that needs to be resolved with the calls to run_cmake_test and the cmake switch handling. I'm currently looking into it, though it might take me a day or two to report back. There shouldn't be anything more needed from you, except possibly a small sub-PR merge if I need to make an edit to the build system and/or regtest scripts.

@jcwarner-usgs
Copy link
Author

make sure in the src_list.cmake you have
${CMAKE_CURRENT_SOURCE_DIR}/SCRIP/scrip_remap_conservative.F
need to use .F. The capital F is an extension change for that file, and is needed to make sure the cpp runs that file.
is that what might be missing?

@MatthewMasarik-NOAA
Copy link
Collaborator

MatthewMasarik-NOAA commented Dec 11, 2024

That's a good thing to check. I do have the .F included in my testcase. For the testcase, I'm running a clone of the branch with edits only to the regtest switch files. I believe the issue is related to the switch file processing during a regtest call, which builds the executables at different stages which may or may not be built with MPI.

@MatthewMasarik-NOAA
Copy link
Collaborator

MatthewMasarik-NOAA commented Dec 13, 2024

@jcwarner-usgs I'm still working on how to resolve the regtest build issue, but I wanted to give an update before the weekend. The issue seems to be how our regtests have been structured, in particular where it uses the '-f' option in the call to run_cmake_test. This 'forces' a shared build of pre/post programs, which as set up, conflicts with also using MPI. I've tried simply removing this option in the call, which appears to get pass the build, but then looks to fail in MPI initialization during preprocessing. This makes sense because turning on MPI would then need those programs to be launched with mpiexec/mpirun/srun/etc, and that is not currently built in. So, this is a matter of modifying that portion of the regtest processing to allow for the possibility of compiling with MPI. I need a little more time to work on how that portion should function now potentially using MPI. I'll check in early next week with my progress.

@MatthewMasarik-NOAA
Copy link
Collaborator

MatthewMasarik-NOAA commented Dec 18, 2024

Hi @jcwarner-usgs I met with @JessicaMeixner-NOAA to discuss how to best proceed here. To recap the issue, in our regtest system using '-f' in the call to run_cmake_test forces the building of pre/post -processing programs to be built as Shared and this clashes with also requesting a Distributed build with MPI for those programs. All the existing regtests using SCRIP use the '-f', so when I tried adding SCRIPMPI and executing those tests the build stops because of those conflicting requests.

The plan we came up with to address this has two parts. 1. The regtests should continue to force a shared build when requested, so this will be enforced in run_cmake_test by removing the SCRIPMPI switch from the builds for those programs. 2. A regtest will be added to use the new switch in which a forced build will not be requested. This will be a quick setup made from an existing multi-grid regtest, and will provide a way to test this functionality, and a template for others who want to use the new switch.

There still shouldn't be anything needed from you except for a merge once I make a sub-PR to your branch for these changes. A timeline for when the sub-PR might be ready is next week after Christmas, though there may be some extra time needed due to the holidays. Let me know if you have any questions.

@MatthewMasarik-NOAA
Copy link
Collaborator

Quick update: I've made some progress on (2) above, and am continuing to try to work through (1). For that I'm hitting issues with how the switch file is manipulated by run_cmake_test. Work is ongoing.

@MatthewMasarik-NOAA
Copy link
Collaborator

Status update - Some progress has been made on adding this within the regtests, though there is still an issue to resolve.

The switch manipulation (sed) within run_cmake_test has been resolved. That issue had been causing the cmake config step to abort, and was found to be due to the ordering of the sequence of sed operations. After fixing that all builds succeeded. Next, I tested the current regtests that had both the ‘SCRIP’ and ‘MPI’ switches by adding ‘SCRIPMPI’ to those switch files. Running these there were two matrix scripts (matrix03 and matrix14) that timed out (8 hour wallclock where the script usually completes in less than 3 hours), and one script, matrix04, that seg faulted. I tried substituting W3_SCRIPMPIW3_MPI within scrip_remap_conservative.F, but got the same result.

Note: I also wanted to mention for the next few days or so I’ll be attending to a matter needed for operations. I will be back though and I’ve asked for some additional help with this so shortly there will be another set of eyes.

I’ve gone through my good leads without luck so would be happy to hear any suggestions. In case anyone else has some insight, here are the run_cmake_test commands:

matrix03 (time out)

run_cmake_test -b slurm -o all -S -T -s PR3_UNO_MPI_SCRIP -w work_PR3_UNO_MPI_c_c -m grdset_c -g curv -f -p srun -n 24 ../model mww3_test_02

matrix14 (time out)

run_cmake_test -b slurm -o all -S -T -s MPI_OMPH -w work_a -m grdset_a -f -p srun -n 140 -t 4 ../model ww3_ufs1.3

matrix04 (seg fault)

run_cmake_test -b slurm -o all -S -T -s PR3_UQ_MPI_SCRIP -w work_PR3_UQ_MPI_d2_c -m grdset_d2 -f -p srun -n 24 ../model mww3_test_03
...
 SI nbPlus=        3944  nbMinus=           0  nbZero=           0                                   
 SI nbPlus=       12960  nbMinus=           0  nbZero=           0                                   
 SI nbPlus=        3944  nbMinus=           0  nbZero=           0                                   
 SI nbPlus=        3944  nbMinus=           0  nbZero=           0                                   
 SI nbPlus=       12960  nbMinus=           0  nbZero=           0                                   
 SI nbPlus=        3944  nbMinus=           0  nbZero=           0                                   
 Grid sweeps completed                                                                               
 Total number of links =         5580                                                                
 Finished Conservative Remapping                                                                     
[h24c51:155369:0:155369] Caught signal 11 (Segmentation fault: Sent by the kernel at address (nil))  
[h24c51:155370:0:155370] Caught signal 11 (Segmentation fault: Sent by the kernel at address (nil))  
[h24c51:155371:0:155371] Caught signal 11 (Segmentation fault: Sent by the kernel at address (nil))  
[h24c51:155372:0:155372] Caught signal 11 (Segmentation fault: Sent by the kernel at address (nil))
...

@jcwarner-usgs
Copy link
Author

i would be glad to look into this further, especially the test that looked like it completed the remapping then seg faulted.
I could do that test with and without the SCRIPMPI and compare the weights files.
Are these the default cases that are distributed with the code?

@MatthewMasarik-NOAA
Copy link
Collaborator

i would be glad to look into this further, especially the test that looked like it completed the remapping then seg faulted. I could do that test with and without the SCRIPMPI and compare the weights files. Are these the default cases that are distributed with the code?

Awesome. These are in the full "matrix" of regression tests that we run. At NCEP we run use the 'matrix_cmake_ncep' script. Also, I actually have a good lead on the seg fault I can pass along. Before I paused on this I realized I hadn't re-run in Debug mode after some of the progress made on the switch manipulation. I'll grab the notes and post back momentarily

@MatthewMasarik-NOAA
Copy link
Collaborator

MatthewMasarik-NOAA commented Jan 28, 2025

I intended to post this much earlier, sorry for the delay. I attached the runtime record for the test that seg faulted, matrix04. it also has records for a few tests that ran before it some which use SCRIP and MPI but didn't have any run issues, so might be interesting too. Any of those tests that already had SCRIP and MPI in their switch file I manually added 'SCRIPMPI' to the file here, so they should be testing your new code.
matrix04.out.txt

For testing you just need to download the input data

./model/bin/ww3_from_ftp.sh

from the root directory. then from the regtests directory you can call run_cmake_test command. For the test that segfaulted

run_cmake_test -b slurm -o all -S -T -s PR3_UQ_MPI_SCRIP -w work_PR3_UQ_MPI_d2_c -m grdset_d2 -f -p srun -n 24 ../model mww3_test_03

we use srun for MPI on the machine I did the test on, so if you guys use something else on your HPC, you may need to edit that in for the -p option where we call srun -n 24. happy to answer questions, let me know if any come up

@jcwarner-usgs
Copy link
Author

i see an issue in the log file:
ww3_multi 000000000167C585 scrip_remap_conse 1072 scrip_remap_conservative.F

can i get the exact version of the scrip_remap_conservative.F that you are using, because i need to see what is on that exact line number. but can you send to me the full file so i can make sure i am running the same tests here?
thanks -j

@MatthewMasarik-NOAA
Copy link
Collaborator

i see an issue in the log file: ww3_multi 000000000167C585 scrip_remap_conse 1072 scrip_remap_conservative.F

can i get the exact version of the scrip_remap_conservative.F that you are using, because i need to see what is on that exact line number. but can you send to me the full file so i can make sure i am running the same tests here? thanks -j

Sure. that's helpful to have with the backtrace! Heres the testing branch I'm using, https://github.com/MatthewMasarik-NOAA/WW3/tree/check-mpi/scrip-mpi. Note that the relevant regtest switch files I was describing have been editted to have 'SCRIPMPI' and CMakeLists.txt hard-coded to build in Debug mode.

@MatthewMasarik-NOAA
Copy link
Collaborator

This is the script that calls those tests (the batch script resources here are specific to NCEP HPC systems).
matrix04.txt

@jcwarner-usgs
Copy link
Author

just letting you know i am making some progress here. The mww3_test_03 test case uses the "Define fraction of communicator (processes) used for this grid" setting in the ww3_multi.inp file. I usually had these as 0 and 1, but this test case has 0, .33, .67 etc. It has to deal with how many procs are being assigned to each grid. So i am digging into it. will give more of an update soon.
-j

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants