Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Check of finidat_interp_dest.status isn't reached #2596

Open
samsrabin opened this issue Jun 11, 2024 · 2 comments
Open

Check of finidat_interp_dest.status isn't reached #2596

samsrabin opened this issue Jun 11, 2024 · 2 comments
Labels
type: bug something is working incorrectly

Comments

@samsrabin
Copy link
Contributor

samsrabin commented Jun 11, 2024

Brief summary of bug

On 0ffbd07, I was having trouble figuring out why the model was crashing for 1x1_brazil. The first time I ran gave a different failure (no useful error info) from the second time ("NetCDF: Attribute not found"). It turns out the second failure was because the first run failed during the write of finidat_interp_dest.nc. But the second run didn't reach this part of the code that would have seen that finidat_interp_dest.status was missing and given a helpful error message:

if (trim(finidat) == trim(finidat_interp_dest)) then
! Check to see if status file for finidat exists
klen = len_trim(finidat_interp_dest) - 3 ! remove the .nc
locfn = finidat_interp_dest(1:klen)//'.status'
inquire(file=trim(locfn), exist=lexists)
if (.not. lexists) then
if (masterproc) then
write(iulog,'(a)')' failed to find file '//trim(locfn)
write(iulog,'(a)')' this indicates a problem in creating '//trim(finidat_interp_dest)
write(iulog,'(a)')' remove '//trim(finidat_interp_dest)//' and try again'
end if
call endrun()
end if

On the second run, it somehow bypassed that bit and failed during the read of the first variable missing from finidat_interp_dest.nc.

General bug information

CTSM version you are using: ctsm5.2.005

Does this bug cause significantly incorrect results in the model's science? No

Configurations affected: Anything that requires interpolating initial conditions but that fails, then you retry.

Important details of your setup / configuration so we can reproduce the bug

I noticed this with a 1x1_brazil case. Happened on Derecho (intel) and Izumi (intel and nag).

Note that, for the commit I mentioned above, it also requires setting stream_gdd20_seasons to true. Example case dir on Izumi at /home/samrabin/cases_ctsm/scale-mat-reqs-pr2-ggcmiseas.brazil/.

@samsrabin samsrabin added type: bug something is working incorrectly tag: next this should get some attention in the next week or two labels Jun 11, 2024
@samsrabin
Copy link
Contributor Author

samsrabin commented Jun 11, 2024

The following patch fixes it (see this branch):

diff --git a/src/main/clm_initializeMod.F90 b/src/main/clm_initializeMod.F90
index e8f70bdef..18a13adf9 100644
--- a/src/main/clm_initializeMod.F90
+++ b/src/main/clm_initializeMod.F90
@@ -174,7 +174,7 @@ contains
     use SatellitePhenologyMod         , only : SatellitePhenologyInit, readAnnualVegetation, interpMonthlyVeg, SatellitePhenology
     use SnowSnicarMod                 , only : SnowAge_init, SnowOptics_init
     use lnd2atmMod                    , only : lnd2atm_minimal
-    use controlMod                    , only : NLFilename
+    use controlMod                    , only : NLFilename, check_missing_initdata_status
     use clm_instMod                   , only : clm_fates
     use BalanceCheckMod               , only : BalanceCheckInit
     use CNSharedParamsMod             , only : CNParamsSetSoilDepth
@@ -520,17 +520,7 @@ contains
        else
           if (trim(finidat) == trim(finidat_interp_dest)) then
              ! Check to see if status file for finidat exists
-             klen = len_trim(finidat_interp_dest) - 3 ! remove the .nc
-             locfn = finidat_interp_dest(1:klen)//'.status'
-             inquire(file=trim(locfn), exist=lexists)
-             if (.not. lexists) then
-                if (masterproc) then
-                   write(iulog,'(a)')' failed to find file '//trim(locfn)
-                   write(iulog,'(a)')' this indicates a problem in creating '//trim(finidat_interp_dest)
-                   write(iulog,'(a)')' remove '//trim(finidat_interp_dest)//' and try again'
-                end if
-                call endrun()
-             end if
+             call check_missing_initdata_status(finidat_interp_dest)
           end if
           if (masterproc) then
              write(iulog,'(a)')'Reading initial conditions from file '//trim(finidat)
diff --git a/src/main/controlMod.F90 b/src/main/controlMod.F90
index aabe75376..82f11b19b 100644
--- a/src/main/controlMod.F90
+++ b/src/main/controlMod.F90
@@ -58,6 +58,7 @@ module controlMod
   public :: control_setNL ! Set namelist filename
   public :: control_init  ! initial run control information
   public :: control_print ! print run control information
+  public :: check_missing_initdata_status  ! check for missing finidat_interp_dest .status file
   !
   !
   ! !PRIVATE MEMBER FUNCTIONS:
@@ -675,6 +676,7 @@ contains
        write(iulog,*) 'Successfully initialized run control settings'
        write(iulog,*)
     endif
+    write(iulog,*) 'Successfully initialized run control settings'

   end subroutine control_init

@@ -1197,6 +1199,38 @@ contains
   end subroutine control_print


+  !-----------------------------------------------------------------------
+  subroutine check_missing_initdata_status(finidat_interp_dest)
+   !
+   ! !DESCRIPTION:
+   ! Checks that the finidat_interp_dest .status file was written (i.e., that write of
+   ! finidat_interp_dest succeeded)
+   !
+   ! !ARGUMENTS:
+   character(len=*), intent(in)    :: finidat_interp_dest
+   !
+   ! !LOCAL VARIABLES:
+   logical                    :: lexists
+   integer                    :: klen
+   character(len=SHR_KIND_CL) :: status_file
+   character(len=*), parameter :: subname = 'check_missing_initdata_status'
+   !-----------------------------------------------------------------------
+
+    klen = len_trim(finidat_interp_dest) - 3 ! remove the .nc
+    status_file = finidat_interp_dest(1:klen)//'.status'
+    inquire(file=trim(status_file), exist=lexists)
+    if (.not. lexists) then
+       if (masterproc) then
+          write(iulog,'(a)')' failed to find file '//trim(status_file)
+          write(iulog,'(a)')' this indicates a problem in creating '//trim(finidat_interp_dest)
+          write(iulog,'(a)')' remove '//trim(finidat_interp_dest)//' and try again'
+       end if
+       call endrun(subname//': finidat_interp_dest file exists but is probably bad')
+    end if
+
+  end subroutine check_missing_initdata_status
+
+
   !-----------------------------------------------------------------------
   subroutine apply_use_init_interp(finidat_interp_dest, finidat, finidat_interp_source)
     !
@@ -1247,6 +1281,10 @@ contains

     inquire(file=trim(finidat_interp_dest), exist=lexists)
     if (lexists) then
+
+       ! Check that the status file also exists (i.e., that finidat_interp_dest was written successfully)
+       call check_missing_initdata_status(finidat_interp_dest)
+
        ! open the input file and check for the name of the input source file
        status = nf90_open(trim(finidat_interp_dest), 0, ncid)
        if (status /= nf90_noerr) call handle_err(status)

@samsrabin
Copy link
Contributor Author

CTSM SE meeting today:

This issue has come up several times. Hard to write a system test that ensures such an error is thrown.

@samsrabin samsrabin removed the tag: next this should get some attention in the next week or two label Jun 13, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
type: bug something is working incorrectly
Projects
None yet
Development

No branches or pull requests

1 participant