Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Failing water isotope test on the ctsm5.2 branch #2444

Open
ekluzek opened this issue Mar 28, 2024 · 10 comments · May be fixed by #2457
Open

Failing water isotope test on the ctsm5.2 branch #2444

ekluzek opened this issue Mar 28, 2024 · 10 comments · May be fixed by #2457
Assignees
Labels
type: bug something is working incorrectly
Milestone

Comments

@ekluzek
Copy link
Contributor

ekluzek commented Mar 28, 2024

Brief summary of bug

As seen here... #2427 (comment)

The test SMS_D_Ld10.f10_f10_mg37.I2000Clm50BgcCrop.izumi_intel.clm-tracer_consistency fails in what will be alpha-ctsm5.2.mksrf.25_ctsm5.1.dev175 as well as alpha-ctsm5.2.mksrf.23_ctsm5.1.dev171 but passed in alpha-ctsm5.2.mksrf.22_ctsm5.1.dev168.

General bug information

CTSM version you are using: alpha-ctsm5.2.mksrf.23_ctsm5.1.dev171

Does this bug cause significantly incorrect results in the model's science? [No

Configurations affected: with enable_water_tracer_consistency_checks = .true.

Important output or errors that show the problem

cesm.log:

This routine is depricated - use shr_log_setLogUnit instead        -134
shr_file_mod.F90         912 
This routine is depricated - use shr_log_setLogUnit instead        -132
shr_file_mod.F90         912 
This routine is depricated - use shr_log_setLogUnit instead        -134
ERROR in CompareBulkToTracer: tracer does not agree with bulk water
Called from: after first stage of hydrology
Variable: snocan_patch
First difference at index: 24
Bulk  :   0.40122337938811679-295
Tracer:   0.00000000000000000E+00
ratio:   0.10000000000000000E-09
Bulk*ratio:   0.40122337938811680-305
iam = 20: local  patch    index = 24
iam = 20: global patch    index = 2947
iam = 20: global column   index = 2211
iam = 20: global landunit index = 470
iam = 20: global gridcell index = 165
iam = 20: gridcell longitude    =   90.0000000
iam = 20: gridcell latitude     =   40.0000000
iam = 20: pft      type         = 8
iam = 20: column   type         = 1
iam = 20: landunit type         = 1
ENDRUN:
ERROR: CompareBulkToTracer: tracer does not agree with bulk water
Image              PC                Routine            Line        Source             
cesm.exe           0000000004AE6556  Unknown               Unknown  Unknown
cesm.exe           000000000421343A  shr_abort_mod_mp_         114  shr_abort_mod.F90
cesm.exe           00000000042132A0  shr_abort_mod_mp_          61  shr_abort_mod.F90
cesm.exe           0000000000A3C63F  abortutils_mp_end          98  abortutils.F90
cesm.exe           00000000035962A7  watertracerutils_         401  WaterTracerUtils.F90
cesm.exe           00000000035B7728  watertype_mp_trac         953  WaterType.F90
cesm.exe           0000000000A5AAE0  clm_driver_mp_clm         588  clm_driver.F90
cesm.exe           00000000009915B2  lnd_comp_nuopc_mp         904  lnd_comp_nuopc.F90
libesmf.so         00007F242D0B3A7E  _ZNK5ESMCI13Metho         377  ESMCI_MethodTable.C
libesmf.so         00007F242D0B4EE2  _ZN5ESMCI11Method         563  ESMCI_MethodTable.C
libesmf.so         00007F242D0B3443  c_esmc_methodtabl         317  ESMCI_MethodTable.C


@ekluzek ekluzek self-assigned this Mar 28, 2024
@ekluzek ekluzek added the type: bug something is working incorrectly label Mar 28, 2024
@ekluzek ekluzek added this to the ctsm5.2.0 milestone Mar 28, 2024
@wwieder
Copy link
Contributor

wwieder commented Mar 28, 2024

is this an important test if water isotopes don't work in CTSM?

@ekluzek
Copy link
Contributor Author

ekluzek commented Mar 28, 2024

@wwieder good question. I'm not sure.

Keeping these tests in place does keep the parts of water isotopes working that is in place. Which should help if someone can get back to water isotopes. And we haven't had these tests fail that often, so it hasn't been onerous.

I am going to leave it alone for a bit and move forward with other things. And I'm figuring out where it breaks which might lead to an easy solution. And a reasonable thing might be to bring in ctsm5.2.0 even with this one failing thing if there isn't an easy fix.

@ekluzek
Copy link
Contributor Author

ekluzek commented Apr 3, 2024

I tracked this back based on tags and it works in alpha-ctsm5.2.mksrf.23_ctsm5.1.dev168 bucould t fails in the next tag we made on ctsm5.2 of alpha-ctsm5.2.mksrf.23_ctsm5.1.dev171. These means it fails somewhere in the update between ctsm5.1.dev168 and ctsm5.1.dev171. In tracking that I find that it works in ctsm5.1.dev170. So the cold-start change in ctsm5.1.dev170 in PR #2355 seems to be the culprit here.

This test is a cold-start test so that could be, but the change is just a small change in initial Temperatures which shouldn't matter for water isotopes.. And yet it somehow does...

@ekluzek
Copy link
Contributor Author

ekluzek commented Apr 3, 2024

I verified that doing the following simple change in #2417 gets this test to pass.

diff --git a/src/biogeophys/TemperatureType.F90 b/src/biogeophys/TemperatureType.F90
index ab310650c..6450f3f31 100644
--- a/src/biogeophys/TemperatureType.F90
+++ b/src/biogeophys/TemperatureType.F90
@@ -741,7 +741,7 @@ subroutine InitCold(this, bounds, &
                   end if
                end if
             else
-               this%t_soisno_col(c,1:nlevgrnd) = 272._r8
+               this%t_soisno_col(c,1:nlevgrnd) = 274._r8
                if (use_excess_ice .and. (lun%itype(l) == istsoil .or. lun%itype(l) == istcrop)) then
                   this%t_soisno_col(c,1:nlevgrnd) = SHR_CONST_TKFRZ - 5.0_r8 !needs to be below freezing to properly initiate excess ice
                end if

@ekluzek ekluzek added the tag: next this should get some attention in the next week or two label Apr 3, 2024
@wwieder wwieder removed the tag: next this should get some attention in the next week or two label Apr 4, 2024
@wwieder
Copy link
Contributor

wwieder commented Apr 4, 2024

@ekluzek is suggesting this an expected fail and will talk to @billsacks to see if it's something we need to modify the test case for? @wwieder assumes this is caused by fractionation caused by freezing water (somehow).

@slevis-lmwg
Copy link
Contributor

This issue reminds me of two issues that I worked on in the last few months; they may be related to this:
#2041
#2366

@billsacks
Copy link
Member

Thanks @ekluzek for the detailed notes here along with all of your investigations so far! Thanks also @slevis-lmwg for the pointers to similar issues, and @wwieder for the good questions. I'm happy to look into this a bit. I have a couple of ideas of things that might fix this that I want to try initially.

Regarding @wwieder 's question:

is this an important test if water isotopes don't work in CTSM?

I have two answers to that:

(1) The water isotope tests help ensure that the ~ 6 months of work that got water tracers / isotopes partially working don't get undone. From that perspective, these tests could be dropped if/when the feeling becomes that it is very unlikely that the water tracer / isotope work will ever be finished... though there's a part of me that would be tempted to explore re-shifting my priorities to work on this for a year and try to get it wrapped up if it came to that (if this had enough support to do so).

(2) Most – maybe all? – recent failures in isotope tests, including this one, are "canary in the coal mine" sorts of failures. That is, even though the failures are in water isotope tests, they're really indicative of bigger problems in the model that have nothing to do with water isotopes (in my view). And this is really tied in with why implementing water tracers in CTSM is hard in the first place: CTSM's hydrology is not careful / rigorous about its numerics. Implementing water tracers in a maintainable way requires more rigor with the hydrology numerics. Many (all?) of the problems we've been running into have been related to state variables getting updated in a way that should bring them to exactly zero but in fact they are ending up roundoff-level different from zero. That seems to be the situation here. Without the tracer/isotope-related checks, the model continues to run, but that doesn't necessarily mean that it's running correctly. Here, for example, you have a snocan value of 0.4 x 10^-295. That is effectively zero, and the model should have set that to exactly zero. If we have any code like if (snocan > 0) then X else Y then this code currently isn't working right under conditions like this. So under this perspective, my feeling is that we should be thankful that these tests are catching issues with CTSM's hydrology numerics that may be causing incorrect operation of the model, so we shouldn't be quick to throw them out.

This test is a cold-start test so that could be, but the change is just a small change in initial Temperatures which shouldn't matter for water isotopes.. And yet it somehow does...

That is indeed weird. My guess is that this is just triggering a different evolution of the system that makes this issue appear when (by chance) it didn't appear before, but maybe there's something weirder going on here.

@wwieder assumes this is caused by fractionation caused by freezing water (somehow).

Scientifically I see how that could make sense, but so far we don't have any fractionation implemented, so that wouldn't explain this.

billsacks added a commit to billsacks/ctsm that referenced this issue Apr 5, 2024
The default of 1e-13 was letting through some state updates that look
like they should have been truncated to 0: each time step, snocan was
decreasing by about 12 orders of magnitude.

Addresses ESCOMP#2444
@billsacks
Copy link
Member

I have resolved this in #2457 . The change is to adjust the tolerance for the truncation that @slevis-lmwg introduced in #2053. Note that this will probably change answers, maybe for many tests. (Note that ctsm5.1.dev141, which was the tag associated with #2053, had answer changes, so I expect this adjustment to have similar answer changes.)

@wwieder
Copy link
Contributor

wwieder commented Apr 5, 2024

Thanks for engaging here, Bill. Much appreciated

@billsacks
Copy link
Member

Thanks for engaging here, Bill. Much appreciated

Happy to help. As I said to Erik, I feel invested in keeping the water tracer stuff as working as possible so that one day in the uncertain future it could all come together....

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
type: bug something is working incorrectly
Projects
No open projects
Updated datasets for CTSM5.2 (ctsm5.2...
  
Post CTSM5.2 Land work
Development

Successfully merging a pull request may close this issue.

4 participants