Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Original time encoding units not used in Zarr generated by beam-refactor branch #465

Closed
derekocallaghan opened this issue Jan 6, 2023 · 4 comments

Comments

@derekocallaghan
Copy link
Contributor

While porting the EOOffshore CCMP recipe to use the beam-refactor branch, I noticed that the output Zarr contains non-unique time values.

It looks like the original dataset time encoding units aren't retained in pangeo_forge_recipes.writers._store_data():

var_coded = var.copy() # copy needed for test suit to avoid modifying inputs in-place

At this point the original time encoding is as expected (i.e. hours...):

{'chunksizes': (524288,), 'fletcher32': False, 'shuffle': False, 'source': '<fsspec.implementations.local.LocalFileOpener object at 0x7f88c259aac0>', 'original_shape': (4,), 'dtype': dtype('float64'), 'units': 'hours since 1987-01-01 00:00:00'}

The units are subsequently replaced after this statement, where the use of days... is resulting in the non-unique values:

var_coded.encoding.update(zarr_array.attrs)

{'chunksizes': (524288,), 'fletcher32': False, 'shuffle': False, 'source': '<fsspec.implementations.local.LocalFileOpener object at 0x7f88c259aac0>', 'original_shape': (4,), 'dtype': dtype('float64'), 'units': 'days since 1970-01-01 00:00:00', '_ARRAY_DIMENSIONS': ['time'], '_CoordinateAxisType': 'Time', '_Fillvalue': -9999.0, 'axis': 'T', 'calendar': 'proleptic_gregorian', 'delta_t': '0000-00-00 06:00:00', 'long_name': 'Time of analysis', 'standard_name': 'time'}

I don't have a fix for this yet, but it looks like the issue is in pangeo_forge_recipes.aggregation._to_variable() based on the existing comments:

# WARNING: there are lots of edge cases aroudn time!
# Xarray will pick a time encoding for the dataset (e.g. "days since days since 1970-01-01")
# and this may not be compatible with the actual values in the time coordinate
# (which we don't know yet)
data = dsa.zeros(shape=shape, chunks=chunks, dtype=dtype)
# TODO: add more encoding

@rabernat
Copy link
Contributor

rabernat commented Jan 6, 2023

This is such a useful bug report. Thank you so much! 🙏

@derekocallaghan
Copy link
Contributor Author

I have a potential fix for this almost working locally, I need to resolve some test failures. I'm hoping to create a PR later this week.

@rabernat
Copy link
Contributor

Amazing! Yes, PR very welcome!

derekocallaghan added a commit to derekocallaghan/pangeo-forge-recipes that referenced this issue Jan 12, 2023
rabernat pushed a commit that referenced this issue Jan 18, 2023
* Original variable encodings are retained (#465)

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Updated following @rabernat review comments

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
@derekocallaghan
Copy link
Contributor Author

Confirmed that this is resolved by #471.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants