-
Notifications
You must be signed in to change notification settings - Fork 369
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
EAMxx: Create IOPForcing ATM process #6787
base: master
Are you sure you want to change the base?
EAMxx: Create IOPForcing ATM process #6787
Conversation
components/eamxx/src/physics/iop_forcing/eamxx_iop_forcing_process_interface.cpp
Show resolved
Hide resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have one comment regarding using the Field utility, but other than that, looks good.
components/eamxx/src/physics/iop_forcing/eamxx_iop_forcing_process_interface.cpp
Show resolved
Hide resolved
components/eamxx/src/physics/iop_forcing/eamxx_iop_forcing_process_interface.cpp
Show resolved
Hide resolved
const auto v_mean_h = Kokkos::create_mirror_view(v_mean); | ||
|
||
const auto num_global_cols = m_grid->get_num_global_dofs(); | ||
for (int k=0; k<m_num_levs; ++k) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Have you thought about doing an outer || for? Yes, the access pattern on GPU is screwed up, but that's the same here. You may as well throw in another layer of ||ism, and do all calculations on device. It will at least save you from creating the mirrot view and do deep copies after the vertical for loop...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The reason I did not is that we do an MPI_AllReduce within the loop and I assumed that couldn't be done on device. Is that a correct assumption?
Anyways, I could save that computation for outside the nest ||for/reduce and all reduce on the entire view, but I'll still need the mirrors I think?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You could do the rescaling at the end of the nested for/reduce, no? Since the scaling is num_global_cols for all entries, no?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, but wouldn't I still need the host mirrors to do the all reduce? And I guess the scaling as well?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, you would, unless SCREAM_MPI_ON_DEVICE is defined. But anyways, I think the convo regarding the Field utility below may be overriding this one..
components/eamxx/src/physics/iop_forcing/eamxx_iop_forcing_process_interface.cpp
Outdated
Show resolved
Hide resolved
v_sum += horiz_winds(icol, 1, k/Pack::n)[k%Pack::n]; | ||
}, u_mean_k, v_mean_k); | ||
|
||
m_comm.all_reduce(&u_mean_k, 1, MPI_SUM); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Another advantage of doing the ||for: you would only fire off ONE all reduce, at the end.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Tbf, you could already do that, by moving the all reduce (and subsequent rescaling) to after the end of the k loop...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Forget all that I said. You could replace the buffers views with Field structures, and use the Field utility "horiz_contraction" that @mahf708 implemented (and that we merged recently). That way, you can just do something like
auto horiz_winds_f = get_field_out("horiz_winds");
auto area = m_grid->get_geo_data("area");
horiz_contraction(horiz_winds_mean,horiz_winds_f, avg_horiz_wind, area,get_comm());
avg_horiz_wind.scale(earth_area); // earth_area can be precomputed at init);
This is assuming that horiz_winds_mean
has layout (2,nlevs);
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let me check that out!
// and observed quantities of T, Q, u, and v | ||
if (iop_nudge_tq or iop_nudge_uv) { | ||
// Compute domain mean of qv, T_mid, u, and v | ||
const auto qv_mean = m_buffer.qv_mean; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, as @bartgol suggested, you should be good with something like
horiz_contraction(qv_mean, qv, wts, &comm)
where I would calculate the area/wts much earlier once as it will remain constant throughout (see #6788 for an example). This assumes you have allocated the field with the right layouts already.
Note the signature should be:
horiz_contraction(
out, // allocated output field of rank same as input, but COL stripped out
in, // allocated input field of any rank up to 3, but must include COL
wts, // allocated field of weights (must be COL only)
&comm);
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
One thing comes to mind when looking at this PR ... is there a way we can extract more stuff out of this IOP process? For example, do we need to do nudging here or can we accommodate the IOP nudged cases in our already existing nudging process?
We can ofc defer/punt until later, just thinking out loud for code simplicity/readability/modularity
@mahf708 this is a good point and I think it's probably possible to eventually merge the IOP nudging routine to the EAMxx general one. However, I'd prefer to punt this to a different PR because I don't think it's trivial. For instance, the IOP nudging routine nudges the domain mean, not the individual column explicitly. Secondly, either the EAMxx nudging routine would need to be modified to support reading in the format of the IOP nudging OR the DPxx workflow and input files would need to be substantially revised. Just two things off the top of my head that come to mind. |
My vote would be to edit the eamxx nudging routine so that it can accommodate the specific needs of iop as opposed to the other way around. But I agree, we could/should wait until later. No rush to do anything now |
f1ef82b
to
b470097
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good!
Two minor comments (though no action is strictly needed):
- I noticed a lot of barrier/fence usage that wasn't clear to me; just noting it here in case these were added for debugging only and left there accidentally
- One thing that can really help is adding some sort of standalone unit testing. For example, you can see the unit tests in rrtmgp or spa or nudging. These can be helpful in learning the code, debugging the code, and also profiling the code. So, while this is still hot in your mind and if and only if you have time, I think adding tests can be quite nice
// Compute field for horizontal contraction weights (1/num_global_dofs) | ||
const auto iop_nudge_tq = m_iop_data_manager->get_params().get<bool>("iop_nudge_tq"); | ||
const auto iop_nudge_uv = m_iop_data_manager->get_params().get<bool>("iop_nudge_uv"); | ||
const Real one_over_num_dofs = 1.0/m_grid->get_num_global_dofs(); | ||
if (iop_nudge_tq or iop_nudge_uv) m_helper_fields.at("horiz_mean_weights").deep_copy(one_over_num_dofs); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I assume that each column in IOP grid always has the same area as every other column?
// Release WS views | ||
ws.release_many_contiguous<3>({&ref_p_mid, &ref_p_int, &ref_p_del}); | ||
}); | ||
Kokkos::fence(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's not clear to me why this fence is needed? Could you elaborate?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good. Can you please ninja delete the IO cout line we discussed on slack, so it goes in with this?
@bogensch Can you test one more time before we merge? Particularly something that uses |
Moves IOP forcing from within HOMME interface to it's own physics process, called directly after dynamics process. Also, changes name of IntensiveObservationPeriod class to IOPDataManager.
if (iop_nudge_tq or iop_nudge_uv) { ... if (iop_nudge_tq or iop_nudge_uv) { ... } }
9004f23
to
efeca26
Compare
@tcclevenger I ran a case that nudges U & V winds by default. The run crashed about six hours in. I ran the case with nudging turned off and the case ran fine. For the case with nudging that crashed I noticed that the U and V being output were about four orders of magnitudes larger than expected. |
@bogensch Thanks, definitely relate to my last round of changes. I'll investigate. |
efeca26
to
8b76d0f
Compare
@bogensch Were you testing on CPU? I think this was an issue with how I was using (or not using) Packs. |
@tcclevenger yes, my testing was done on pm-cpu. I did not test on pm-gpu. |
Fixes a bug where real views were being used as if they were packed.
Moves IOP forcing from within HOMME interface to it's own physics process, called directly after dynamics process. Also, changes name of
IntensiveObservationPeriod
class toIOPDataManager
.I expect this to be non-BFB, at the very least since we are changing the order in which IOP forcing is applied to FM fields.