Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

EAMxx: Create IOPForcing ATM process #6787

Open
wants to merge 5 commits into
base: master
Choose a base branch
from

Conversation

tcclevenger
Copy link
Contributor

Moves IOP forcing from within HOMME interface to it's own physics process, called directly after dynamics process. Also, changes name of IntensiveObservationPeriod class to IOPDataManager.

I expect this to be non-BFB, at the very least since we are changing the order in which IOP forcing is applied to FM fields.

@tcclevenger tcclevenger self-assigned this Nov 29, 2024
@tcclevenger tcclevenger added non-BFB PR makes roundoff changes to answers. EAMxx PRs focused on capabilities for EAMxx labels Nov 29, 2024
@tcclevenger tcclevenger changed the title Create IOPForcing ATM process EAMxx: Create IOPForcing ATM process Dec 2, 2024
Copy link
Contributor

@bartgol bartgol left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have one comment regarding using the Field utility, but other than that, looks good.

const auto v_mean_h = Kokkos::create_mirror_view(v_mean);

const auto num_global_cols = m_grid->get_num_global_dofs();
for (int k=0; k<m_num_levs; ++k) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Have you thought about doing an outer || for? Yes, the access pattern on GPU is screwed up, but that's the same here. You may as well throw in another layer of ||ism, and do all calculations on device. It will at least save you from creating the mirrot view and do deep copies after the vertical for loop...

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The reason I did not is that we do an MPI_AllReduce within the loop and I assumed that couldn't be done on device. Is that a correct assumption?

Anyways, I could save that computation for outside the nest ||for/reduce and all reduce on the entire view, but I'll still need the mirrors I think?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You could do the rescaling at the end of the nested for/reduce, no? Since the scaling is num_global_cols for all entries, no?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, but wouldn't I still need the host mirrors to do the all reduce? And I guess the scaling as well?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, you would, unless SCREAM_MPI_ON_DEVICE is defined. But anyways, I think the convo regarding the Field utility below may be overriding this one..

v_sum += horiz_winds(icol, 1, k/Pack::n)[k%Pack::n];
}, u_mean_k, v_mean_k);

m_comm.all_reduce(&u_mean_k, 1, MPI_SUM);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Another advantage of doing the ||for: you would only fire off ONE all reduce, at the end.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Tbf, you could already do that, by moving the all reduce (and subsequent rescaling) to after the end of the k loop...

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Forget all that I said. You could replace the buffers views with Field structures, and use the Field utility "horiz_contraction" that @mahf708 implemented (and that we merged recently). That way, you can just do something like

auto horiz_winds_f = get_field_out("horiz_winds");
auto area = m_grid->get_geo_data("area");
horiz_contraction(horiz_winds_mean,horiz_winds_f, avg_horiz_wind, area,get_comm());
avg_horiz_wind.scale(earth_area); // earth_area can be precomputed at init);

This is assuming that horiz_winds_mean has layout (2,nlevs);

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let me check that out!

// and observed quantities of T, Q, u, and v
if (iop_nudge_tq or iop_nudge_uv) {
// Compute domain mean of qv, T_mid, u, and v
const auto qv_mean = m_buffer.qv_mean;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, as @bartgol suggested, you should be good with something like

horiz_contraction(qv_mean, qv, wts, &comm)

where I would calculate the area/wts much earlier once as it will remain constant throughout (see #6788 for an example). This assumes you have allocated the field with the right layouts already.

Note the signature should be:

horiz_contraction(
  out, // allocated output field of rank same as input, but COL stripped out
  in,  // allocated input field of any rank up to 3, but must include COL
  wts, // allocated field of weights (must be COL only)
  &comm);

Copy link
Contributor

@mahf708 mahf708 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One thing comes to mind when looking at this PR ... is there a way we can extract more stuff out of this IOP process? For example, do we need to do nudging here or can we accommodate the IOP nudged cases in our already existing nudging process?

We can ofc defer/punt until later, just thinking out loud for code simplicity/readability/modularity

@bogensch
Copy link
Contributor

bogensch commented Dec 4, 2024

@mahf708 this is a good point and I think it's probably possible to eventually merge the IOP nudging routine to the EAMxx general one. However, I'd prefer to punt this to a different PR because I don't think it's trivial. For instance, the IOP nudging routine nudges the domain mean, not the individual column explicitly. Secondly, either the EAMxx nudging routine would need to be modified to support reading in the format of the IOP nudging OR the DPxx workflow and input files would need to be substantially revised. Just two things off the top of my head that come to mind.

@mahf708
Copy link
Contributor

mahf708 commented Dec 4, 2024

@mahf708 this is a good point and I think it's probably possible to eventually merge the IOP nudging routine to the EAMxx general one. However, I'd prefer to punt this to a different PR because I don't think it's trivial. For instance, the IOP nudging routine nudges the domain mean, not the individual column explicitly. Secondly, either the EAMxx nudging routine would need to be modified to support reading in the format of the IOP nudging OR the DPxx workflow and input files would need to be substantially revised. Just two things off the top of my head that come to mind.

My vote would be to edit the eamxx nudging routine so that it can accommodate the specific needs of iop as opposed to the other way around. But I agree, we could/should wait until later. No rush to do anything now

Copy link
Contributor

@mahf708 mahf708 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good!

Two minor comments (though no action is strictly needed):

  • I noticed a lot of barrier/fence usage that wasn't clear to me; just noting it here in case these were added for debugging only and left there accidentally
  • One thing that can really help is adding some sort of standalone unit testing. For example, you can see the unit tests in rrtmgp or spa or nudging. These can be helpful in learning the code, debugging the code, and also profiling the code. So, while this is still hot in your mind and if and only if you have time, I think adding tests can be quite nice

Comment on lines +133 to +137
// Compute field for horizontal contraction weights (1/num_global_dofs)
const auto iop_nudge_tq = m_iop_data_manager->get_params().get<bool>("iop_nudge_tq");
const auto iop_nudge_uv = m_iop_data_manager->get_params().get<bool>("iop_nudge_uv");
const Real one_over_num_dofs = 1.0/m_grid->get_num_global_dofs();
if (iop_nudge_tq or iop_nudge_uv) m_helper_fields.at("horiz_mean_weights").deep_copy(one_over_num_dofs);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I assume that each column in IOP grid always has the same area as every other column?

// Release WS views
ws.release_many_contiguous<3>({&ref_p_mid, &ref_p_int, &ref_p_del});
});
Kokkos::fence();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's not clear to me why this fence is needed? Could you elaborate?

Copy link
Contributor

@bartgol bartgol left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good. Can you please ninja delete the IO cout line we discussed on slack, so it goes in with this?

@tcclevenger
Copy link
Contributor Author

@bogensch Can you test one more time before we merge? Particularly something that uses iop_nudging_uv/tq?

Moves IOP forcing from within HOMME interface to it's own physics
process, called directly after dynamics process. Also, changes name of
IntensiveObservationPeriod class to IOPDataManager.
if (iop_nudge_tq or iop_nudge_uv) {
  ...
  if (iop_nudge_tq or iop_nudge_uv) {
    ...
  }
}
@tcclevenger tcclevenger force-pushed the tcclevenger/eamxx/iop_as_atm branch from 9004f23 to efeca26 Compare December 9, 2024 13:42
@tcclevenger tcclevenger requested a review from bartgol December 9, 2024 13:42
@bogensch
Copy link
Contributor

bogensch commented Dec 9, 2024

@tcclevenger I ran a case that nudges U & V winds by default. The run crashed about six hours in. I ran the case with nudging turned off and the case ran fine.

For the case with nudging that crashed I noticed that the U and V being output were about four orders of magnitudes larger than expected.

@tcclevenger
Copy link
Contributor Author

@bogensch Thanks, definitely relate to my last round of changes. I'll investigate.

@tcclevenger tcclevenger force-pushed the tcclevenger/eamxx/iop_as_atm branch from efeca26 to 8b76d0f Compare December 9, 2024 20:55
@tcclevenger
Copy link
Contributor Author

@bogensch Were you testing on CPU? I think this was an issue with how I was using (or not using) Packs.

@bogensch
Copy link
Contributor

@tcclevenger yes, my testing was done on pm-cpu. I did not test on pm-gpu.

Fixes a bug where real views were being used as if they were packed.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
EAMxx PRs focused on capabilities for EAMxx non-BFB PR makes roundoff changes to answers.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants