Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Minor modification and bug fix in GFS cumulus convection schemes #232

Open
wants to merge 13 commits into
base: ufs/dev
Choose a base branch
from

Conversation

rhaesung
Copy link
Collaborator

@rhaesung rhaesung commented Oct 31, 2024

The code of this PR was provided by @JongilHan66.

  1. Modified prognostic updraft fraction (sigmab) calculation in 'progsigma_calc.f90' which is physically more sound:
    a) moisture convergence calculation: integrate from the convection source level rather than from the cloud base
    b) 2D advection of sigmab: sigmab advection averaged over the cloud layers rather than taking a maximum sigmab advection from k=2 to the model top
    c) To suppress unrealistically large reflectivity in the model first time step, minimum sigmab at the first time step is set to zero

  2. Fix in missing vertical transport of turbulent kinetic energy (TKE) when aerosol transport is turned on (samfdeepcnv.f & samfshalcnv.f)

  3. Introduce TKE at model layer interfaces (tkeh) for use in convection schemes (GFS_typedefs.F90, GFS_typedefs.meta, satmedmfvdifq.F, satmedmfvdifq.meta, samfdeepcnv.f, samfdeepcnv.meta, samfshalcnv.f, and samfshalcnv.meta)

  4. Vertical transports of hydrometeor variables are currently not allowed in the convection schemes. But vertical transports of number concentrations of only cloud water and ice are mistakenly allowed, which is fixed in this update (CCPP_typedefs.F90)

  5. All the modifications and bug fixes above had neutral impacts on the GFS forecasts

@JongilHan66
Copy link
Collaborator

@rhaesung Please add one more under item 1. in the PR description:
c) To suppress unrealistically large reflectivity in the model first time step, minimum sigmab at the first time step is set to zero

@rhaesung
Copy link
Collaborator Author

@rhaesung Please add one more under item 1. in the PR description: c) To suppress unrealistically large reflectivity in the model first time step, minimum sigmab at the first time step is set to zero

@JongilHan66 Done!

Copy link
Collaborator

@lisa-bengtsson lisa-bengtsson left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good!

Copy link
Collaborator

@dustinswales dustinswales left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@rhaesung These changes all look good to me.
I have a few tiny suggestions for the precision to make the code more portable.

physics/CONV/progsigma_calc.f90 Outdated Show resolved Hide resolved
physics/CONV/progsigma_calc.f90 Outdated Show resolved Hide resolved
Copy link
Collaborator

@grantfirl grantfirl left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good to me, @rhaesung

@grantfirl grantfirl requested a review from climbfuji November 15, 2024 16:05
Copy link

@climbfuji climbfuji left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks ok to me.

@yangfanglin
Copy link
Collaborator

What is the schedule for merging this PR ?

@grantfirl
Copy link
Collaborator

What is the schedule for merging this PR ?

@yangfanglin I think that the plan was to try to merge the remaining PRs from the RRFS branch and then do this one. If this is a higher priority than the RRFS ones, we can switch up. FYI @jkbk2004

standard_name = vertical_turbulent_kinetic_energy_at_interface
long_name = vertical turbulent kinetic energy at model layer interfaces
units = m2 s-2
dimensions = (horizontal_loop_extent,vertical_layer_dimension)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

See https://github.com/NOAA-EMC/fv3atm/pull/885/files#r1934724405. This should be vertical_interface_dimension. Since tkeh is currently being initialized in this scheme up to km1 on line 358, it's actually leaving the top value of the array uninitialized (gets filled with garbage every time this scheme is called). I don't know if this is causing the reproducibility issue because subsequent use seems to be limited to only accessing through km1, but I think it is still a bug worth fixing. Sorry for not catching this in my initial review.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Whatever the vertical dimension of tkeh ends up being, since it is intent(out) it should at least be initialized over its entire domain in this scheme. Uninitialized values always seem to cause problems at some point.

@rhaesung
Copy link
Collaborator Author

@grantfirl Thanks for catching this! I'll coordinate with @JongilHan66 to update the scheme based on your feedback.

@@ -1286,6 +1285,22 @@ subroutine samfdeepcnv_run (im,km,first_time_step,restart, &
enddo
enddo
enddo
kk = ntk -2
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What happens when this scheme is called when ntk is initialized to -99 (i.e. when sgs_tke isn't available or being advected)? Are you sure that these code changes only get executed when ntk is > 3? I would think that it would result in a seg fault rather than random behavior if kk ends up being negative though.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I noticed that these changes exist within a 'if (.not. hwrf_samdeep) and if (do_aerosols). Is this consistent with the tests that are non reproducible? If so, I'd think that the culprit is in this change (and the similar one in samfshalcnv.F).

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@grantfirl Good finding!!!!!

@rhaesung Please modify them as:

  1. line 1288-1303 in samfdeepcnv.f:

      kk = ntk -2
      do k = 2, km1
        do i = 1, im
          if (cnvflg(i)) then
            if(k > kb(i) .and. k < kmax(i)) then
              dz = zi(i,k) - zi(i,k-1)
              tem  = 0.25 * (xlamue(i,k)+xlamue(i,k-1)) * dz
              tem  = cq * tem
              factor = 1. + tem
              ecko(i,k,kk) = ((1. - tem) * ecko(i,k-1,kk) + tem *
    

    & (ctro(i,k,kk) + ctro(i,k-1,kk))) / factor
    ercko(i,k,kk) = ecko(i,k,kk)
    endif
    endif
    enddo
    enddo

=>
if(ntk > 2) then
kk = ntk -2
do k = 2, km1
do i = 1, im
if (cnvflg(i)) then
if(k > kb(i) .and. k < kmax(i)) then
dz = zi(i,k) - zi(i,k-1)
tem = 0.25 * (xlamue(i,k)+xlamue(i,k-1)) * dz
tem = cq * tem
factor = 1. + tem
ecko(i,k,kk) = ((1. - tem) * ecko(i,k-1,kk) + tem *
& (ctro(i,k,kk) + ctro(i,k-1,kk))) / factor
ercko(i,k,kk) = ecko(i,k,kk)
endif
endif
enddo
enddo
endif

  1. line 1095-1110 in samfshalcnv.f:

      kk = ntk -2
      do k = 2, km1
        do i = 1, im
          if (cnvflg(i)) then
            if(k > kb(i) .and. k < kmax(i)) then
              dz = zi(i,k) - zi(i,k-1)
              tem  = 0.25 * (xlamue(i,k)+xlamue(i,k-1)) * dz
              tem  = cq * tem
              factor = 1. + tem
              ecko(i,k,kk) = ((1. - tem) * ecko(i,k-1,kk) + tem *
    

    & (ctro(i,k,kk) + ctro(i,k-1,kk))) / factor
    ercko(i,k,kk) = ecko(i,k,kk)
    endif
    endif
    enddo
    enddo

=>

     if(ntk > 2) then
     kk = ntk -2
     do k = 2, km1
       do i = 1, im
         if (cnvflg(i)) then
           if(k > kb(i) .and. k < kmax(i)) then
             dz = zi(i,k) - zi(i,k-1)
             tem  = 0.25 * (xlamue(i,k)+xlamue(i,k-1)) * dz
             tem  = cq * tem
             factor = 1. + tem
             ecko(i,k,kk) = ((1. - tem) * ecko(i,k-1,kk) + tem *
 &                   (ctro(i,k,kk) + ctro(i,k-1,kk))) / factor
             ercko(i,k,kk) = ecko(i,k,kk)
           endif
         endif
       enddo
     enddo
     endif

Thanks!!

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@grantfirl Thanks! @JongilHan66 Sure, I'll test these changes.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@grantfirl @JongilHan66 No luck with this change. Something else is happening. FYI, I tested both with this change and by removing tkeh. The baseline comparison case that failed is: cpld_restart_gfsv17 intel.

Comment on lines -171 to -174
if(flag_init .and. .not. flag_restart)then
do i = 1,im
if(cnvflg(i))then
sigmab(i)=0.03
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@JongilHan66 It looks like this part is needed to reproduce the baseline for at least the cpld_restart_gfsv17 Intel test. Did you intend to remove this?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@rhaesung Yes, it is intended. sigmab is always initialized as zero in the beginning of progsigma_calc.f90, so I don't think it causes the reproducibility problem. Although it may takes more time, could you test one by one for this PR change from the codes before this PR change?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@JongilHan66 Thank you for the clarification. I understand that sigmab is initialized to zero in progsigma_calc.f90, but given that the cpld_restart_gfsv17 test passed with this part included, it seems like it may still be playing a role in ensuring correct behavior. I will follow your suggestion and test the PR changes step-by-step, comparing with the previous version, to identify what specifically is affecting the reproducibility. I’ll keep you updated with the progress.

enddo
endif
endif!cnvflg
enddo

do k=1,km
do i=1,im
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@JongilHan66 In line 214, shouldn't sigmab(i) =MAX(sigmind, sigmab(i)) be changed to sigmab(i) = MAX(sigmind_new, sigmab(i))?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@rhaesung You are right. Please change it.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@JongilHan66 does this mean you change the minimum sigma value only at the first time-step?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@rhaesung You are right. Please change it.

@JongilHan66 Okay, this correction has allowed most of the failed test cases to pass, but there are still issues with the reproducibility of the cpld_control_gfsv17 and cpld_restart_gfsv17 tests. I still suspect the changes in progsigma_calc.f90. I'll keep you updated.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You may need: if(flag_init .and. .not. flag_restart)then, because in the case of restart you don't want to use the initial value, but rather the value read in from the restart file.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You may need: if(flag_init .and. .not. flag_restart)then, because in the case of restart you don't want to use the initial value, but rather the value read in from the restart file.

@lisa-bengtsson Thank you for the suggestion! I agree with your point. I’ll incorporate the if(flag_init .and. .not. flag_restart) condition to ensure that we only use the initial value during the initialization phase and not during a restart.

@JongilHan66
Copy link
Collaborator

@lisa-bengtsson Yes. Initial value of sigmab=0.03 tended to produce unrealistically large reflectivity at the initial time especially in the RRFS.

@lisa-bengtsson
Copy link
Collaborator

@lisa-bengtsson Yes. Initial value of sigmab=0.03 tended to produce unrealistically large reflectivity at the initial time especially in the RRFS.

@JongilHan66 I recall now... it's taken some time for this PR to get through the queue.

@rhaesung
Copy link
Collaborator Author

rhaesung commented Feb 13, 2025

@JongilHan66 @grantfirl I confirmed that the changes in progsigma_calc.f90 in PR #252 prevent the completion of the cpld_restart_gfsv17 test when compared to the new baseline, with the following error:

120: FATAL from PE 120: NaN in input field of mpp_reproducing_sum(_2d), this indicates numerical instability

@JongilHan66 Could you please review the logic for sigmind_new to ensure it doesn’t result in invalid values during the simulation? I think we need something like the if(flag_init .and. .not. flag_restart) condition in line 66, since the restart run is failing?

@JongilHan66
Copy link
Collaborator

@rhaesung I don't think sigmind_new causes the instability as it is a local constant variable (minimum updraft fraction; sigmind=0.01). Have you run it with the original progsigma_calc.f90 before the modification?

@rhaesung
Copy link
Collaborator Author

@rhaesung I don't think sigmind_new causes the instability as it is a local constant variable (minimum updraft fraction; sigmind=0.01). Have you run it with the original progsigma_calc.f90 before the modification?

@JongilHan66 Yes, I recall that the develop branch worked fine. Please review the change in Jili’s PR 252. Since this change prevents the cpld_restart_gfsv17 test from completing with the above error when compared to the new baseline, I suspect the issue is related to the proper handling of sigmind_new and sigmind during the simulation.

@JongilHan66
Copy link
Collaborator

@rhaesung The changes associated with sigmind_new & sigmind follow Jili's changes. And I find the current changes are exactly the same as Jili's PR252. In the restart, sigmind_new will be sigmind=0.01. I don't get it why the constant sigind_new causes a problem in restart. How about setting sigmind_new=0.0 before line 66 (i.e., if (flag_init) then)?

@JongilHan66
Copy link
Collaborator

@rhaesung @lisa-bengtsson I thought that if flag_restart=.true., flag_init automatically becomes .false, so sigmind_new=sigmind=0.01. Am I wrong?

@lisa-bengtsson
Copy link
Collaborator

@rhaesung @lisa-bengtsson I thought that if flag_restart=.true., flag_init automatically becomes .false, so sigmind_new=sigmind=0.01. Am I wrong?

I thought so too, but it turned out flag_restart can be true and the flag_init can be true at the same time - indicating that it is the first time-step under restart. That is why I put the .not. flag_restart condition in there.

@JongilHan66
Copy link
Collaborator

@lisa-bengtsson Thanks for the clarification!! @rhaesung Please change the 66 line to if(flag_init .and. .not. flag_restart). But I don't think this change can avoid the numerical instability in restart.

@rhaesung
Copy link
Collaborator Author

@JongilHan66 @lisa-bengtsson @grantfirl Okay, I confirmed that if(flag_init .and. .not. flag_restart) is at least resolving the issue in PR 252. I’ll re-test the failed tests with this change for this PR.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

9 participants