-
Notifications
You must be signed in to change notification settings - Fork 6
Change default IO type from NETCDF4C to PNETCDF #325
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
Testing:
|
|
Thanks very much, @amametjanov! This looks promising. |
|
@amametjanov, I've got 3 tests for this in the queue, one on Chrysalis and 2 on Frontier. But wait times seem to be a bit long both places. I'll keep you posted. |
|
To fix |
|
@amametjanov, maybe this will be fixed by E3SM-Project/scorpio#670 but what I'm seeing on Chrysalis with this branch is: The polaris output is available at: It seems like this won't be a short-term fix for Omega if a scoprio fix is needed, because that would mean:
It feels like we should look into whether there's some alternative way to address #323 in the next week or two. |
Update scorpio from v1.8.2 2025-Jul-14 to v1.9.0 2025-Nov-21. Also add fix for PnetCDF CDF5 types.
4d44424 to
e68388f
Compare
|
Xylar, please check with updated head of this branch to see if it fixes This branch updates the scorpio submodule ahead of E3SM/master's version, which is still on v1.8.2 2025-Jul-14. When scorpio gets update in E3SM/master (with v1.9.0 or later), E3SM/master merge to Omega/develop will subsume this branch's updates. |
|
Thanks, @amametjanov! I'll retest as soon as I can. |
xylar
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I tested the omega_pr suite using the fix in E3SM-Project/polaris#442, pointing to this branch for the Omega build.
I was able to run successfully with both Intel and Gnu on Chrysalis. I discovered that I can't log in to either Aurora or Frontier at the moment. I'm trying on Perlmutter (CPU and GPU) next.
In the mean time, two small questions/comments.
|
On Perlmutter-CPU (Both Intel and Gnu) and -GPU (Gnu-GPU), I'm seeing the same hanging behavior reported in E3SM-Project/polaris#396 as we had seen previously. It seems like maybe that behavior is unfortunately independent of this PIO problem. |
|
@amametjanov , I ran the tests for this PR on Frontier, but I got the same PIO error: Please see Frontier test results at Omega CDash dashboard at https://my.cdash.org/index.php?project=omega |
|
@amametjanov, could you please let me know when this is ready for me to re-test in Polaris? |
|
Yes, this is ready, please re-run Polaris tests. 🙏 |
TestingI successfully ran the
(Unchecked items are still in the queue -- update soon...) I also verified that one @amametjanov and @grnydawn, thank you so much for figuring out these issues and fixing them! |
|
My Frontier tests are now running. All the CPU tests did okay. The GPU tests are very slow by comparison, and they are taking more than the 1 hour I had allocated. I don't know for sure but I presume the slowness is not from this PR. I also saw the I/O failure in one I will rerun both the tests that timed out and the one that failed. We will see what happens. Update: It seems like the file system on Frontier might be a problem. My resubmitted jobs are hanging just trying to load the environment. |
|
In the This is for the |
|
When I try to rerun the failed test ( The original error was the same as we have seen before: I presume these errors might indicate that Omega isn't overwriting the |
|
For Frontier, craygnu and craygnu-mphipcc is the only compiler E3SM cares about. Don't spend more then a token amount of E3SM time looking that the others. |
|
Okay, thanks @rljacob. That wasn't clear to me. |
|
I set up |
|
Nope, now I'm seeing the usual error message: but this time in the So I think this PR should go in but we can't consider this problem to be solved. |
|
Thank you for re-running Polaris tests (and merging).
I heard that frontier scratch filesystem was hanging and slow this week: maybe that's the culprit. |
This merge updates the e3sm_submodules/Omega submodule from [f2e951a](https://github.com/E3SM-Project/Omega/tree/f2e951a) to [fc53608](https://github.com/E3SM-Project/Omega/tree/fc53608). This update includes the following MPAS-Ocean and MPAS-Frameworks PRs (check mark indicates bit-for-bit with previous PR in the list): - [ ] (ocn) E3SM-Project/Omega#325 - [ ] (ocn) E3SM-Project/Omega#329
Change default IO type from NETCDF4C to PNETCDF
Checklist
Testingwith the following:have been run on and indicate that are all passing.
has passed, using the Polaris
e3sm_submodules/Omegabaseline-pfor both the baseline (Polarise3sm_submodules/Omega) and the PR buildFixes #333
Closes #334