Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Adding recipe for GODAS #21

Closed
wants to merge 24 commits into from
Closed
Show file tree
Hide file tree
Changes from 12 commits
Commits
Show all changes
24 commits
Select commit Hold shift + click to select a range
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
50 changes: 50 additions & 0 deletions feedstock/godas.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,50 @@
"""
NCEP Global Ocean Data Assimilation System (GODAS)
"""
import apache_beam as beam
from pangeo_forge_recipes.patterns import MergeDim, ConcatDim, FilePattern
from pangeo_forge_recipes.transforms import OpenURLWithFSSpec, OpenWithXarray, StoreToZarr, Indexed, T
variables = ['sshg', 'thflx']
years = [1980, 1981, 1982]

def make_full_path(variable, time):
return f"https://downloads.psl.noaa.gov/Datasets/godas/{variable}.{time}.nc"

variable_merge_dim = MergeDim("variable", variables)
time_concat_dim = ConcatDim("time", years)

## preprocessing transform

class Preprocess(beam.PTransform):
"""
Set variables to be coordinates
"""

@staticmethod
def _set_bnds_as_coords(item: Indexed[T]) -> Indexed[T]:
"""
The netcdf lists some of the coordinate variables as data variables.
This is a fix which we want to apply to each dataset.
"""
index, ds = item
new_coords_vars = ['date', 'timePlot']
ds = ds.set_coords(new_coords_vars)
return index, ds

def expand(self, pcoll: beam.PCollection) -> beam.PCollection:
return pcoll | beam.Map(self._set_bnds_as_coords)


pattern = FilePattern(make_full_path, variable_merge_dim, time_concat_dim, file_type="netcdf4")

GODAS = (
beam.Create(pattern.items())
| OpenURLWithFSSpec()
| OpenWithXarray(file_type=pattern.file_type)
| Preprocess() # New preprocessor
| StoreToZarr(
target_chunks={'time':120},
store_name="GODAS.zarr",
combine_dims=pattern.combine_dim_keys,
)
)
2 changes: 2 additions & 0 deletions feedstock/meta.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -34,6 +34,8 @@ recipes:
- producer
- licensor
license: "CC-BY-4.0"
- id: "GODAS"
object: "godas:GODAS"
- id: CASM
object: "casm:CASM"
description: A long-term Consistent Artificial-intelligence based Soil Moisture dataset based on machine learning and remote sensing
Expand Down
2 changes: 2 additions & 0 deletions feedstock/requirements.txt
Original file line number Diff line number Diff line change
@@ -1 +1,3 @@
git+https://github.com/pangeo-forge/pangeo-forge-recipes.git@beam-refactor
jbusecke marked this conversation as resolved.
Show resolved Hide resolved
fsspec==2023.5.0
gcsfs==2023.5.0