Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

EAMxx: fix valgrind failures #6841

Open
wants to merge 3 commits into
base: master
Choose a base branch
from
Open

Conversation

bartgol
Copy link
Contributor

@bartgol bartgol commented Dec 9, 2024

I was able to fix our valgrind failures by doing a few mods. Namely:

  • use pack size 1 in our valg build: avoids issues with pow(UNINITED,exponent). The altearnative is to ensure the base is always inited, but there may be quite a few ramifications.
  • set some defaults for se_XYZ namelist variables in HOMME before reading them, in case they are missing in the input namelist. Usually, they are missing if not used (e.g., no point in setting se_ne for DP runs, since se_ne_x and se_ne_y will be used), but valgrind doesn't know that. @oksanaguba I added you as a reviewer for the handful of lines in namelist mod, just in case.
  • the valg test that ensures failures are detected was somewhat broken, due to compiler optimizations (I think the compiler was able to detect that the condition i<10 was always going to be verified).

NOTE: this PR does touch stuff outside of eamxx, so it will have to undergo nightly testing through next.

@bartgol bartgol added EAMxx PRs focused on capabilities for EAMxx memory labels Dec 9, 2024
@bartgol bartgol requested review from jgfouca and oksanaguba December 9, 2024 21:41
@bartgol bartgol self-assigned this Dec 9, 2024
Copy link

github-actions bot commented Dec 9, 2024

PR Preview Action v1.4.8
🚀 Deployed preview to https://E3SM-Project.github.io/E3SM/pr-preview/pr-6841/
on branch gh-pages at 2024-12-09 21:43 UTC

@jgfouca
Copy link
Member

jgfouca commented Dec 10, 2024

use pack size 1 in our valg build:

This would mean a loss of some coverage, right? What's the alternate solution?

@bartgol
Copy link
Contributor Author

bartgol commented Dec 10, 2024

use pack size 1 in our valg build:

This would mean a loss of some coverage, right? What's the alternate solution?

Technically, yes. But the only alternative is to stuff the code with lots of if/ifdef, to make sure padding in packs is always valid. This is not much different from what we do for FPE.

I think, as a double safety, we could make sure that packs and views in debug are always initialized with quiet_NaN() (or, more generally, with ekat::ScalarTraits<T>::invalid()). This would allow us to catch errors when pack size > 1 in regular debug builds (since if we ever access those garbage values to compute physical entries, we'd get nans). But I think that may require some time, and we should prob do it in a separate PR?

Copy link
Member

@jgfouca jgfouca left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If you plan a follow-on, this is fine.

@bartgol
Copy link
Contributor Author

bartgol commented Dec 11, 2024

In next.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
EAMxx PRs focused on capabilities for EAMxx memory
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants