Skip to content

Commit

Permalink
Merge remote-tracking branch 'origin/main' into multidim-rechunk-fix
Browse files Browse the repository at this point in the history
  • Loading branch information
cisaacstern committed Mar 1, 2024
2 parents 50f8f62 + c76269a commit 2d0c119
Show file tree
Hide file tree
Showing 38 changed files with 892 additions and 486 deletions.
4 changes: 2 additions & 2 deletions .github/workflows/main.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -28,7 +28,7 @@ jobs:
fetch-depth: 0 # checkout tags (which is not done by default)
- name: 🔁 Setup Python
id: setup-python
uses: actions/setup-python@v4
uses: actions/setup-python@v5
with:
python-version: ${{ matrix.python-version }}
cache: pip
Expand Down Expand Up @@ -58,7 +58,7 @@ jobs:
if: |
github.event_name == 'push' ||
github.event_name == 'pull_request'
uses: codecov/codecov-action@v3.1.4
uses: codecov/codecov-action@v4.0.2
with:
file: ./coverage.xml
env_vars: OS,PYTHON
Expand Down
2 changes: 1 addition & 1 deletion .github/workflows/release.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -11,7 +11,7 @@ jobs:
steps:
- uses: actions/checkout@v4
- name: Set up Python
uses: actions/setup-python@v4
uses: actions/setup-python@v5
with:
python-version: '3.x'
- name: Install dependencies
Expand Down
24 changes: 16 additions & 8 deletions .github/workflows/test-integration.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -31,19 +31,32 @@ jobs:
# integration testing for 3.10 and 3.11 (for runner versions that follow that PR).
python-version: ["3.9"] # , "3.10", "3.11"]
runner-version: [
"pangeo-forge-runner==0.8.0",
"pangeo-forge-runner==0.9.0",
"pangeo-forge-runner==0.9.1",
"pangeo-forge-runner==0.9.2",
"pangeo-forge-runner==0.9.3",
]
steps:
- uses: actions/checkout@v4
- name: 🔁 Setup Python
id: setup-python
uses: actions/setup-python@v4
uses: actions/setup-python@v5
with:
python-version: ${{ matrix.python-version }}
cache: pip
cache-dependency-path: pyproject.toml


- name: Install pangeo-forge recipes and runner
shell: bash -l {0}
run: |
python -m pip install ${{ matrix.runner-version }}
python -m pip install -e ".[test,minio]"
- name: Install optional grib deps
shell: bash -l {0}
run: |
python -m pip install ecmwflibs eccodes cfgrib
- name: 'Setup minio'
run: |
wget --quiet https://dl.min.io/server/minio/release/linux-amd64/minio
Expand All @@ -54,11 +67,6 @@ jobs:
- name: 🎯 Check cache hit
run: echo '${{ steps.setup-python.outputs.cache-hit }}'
- name: 🌈 Install pangeo-forge-recipes & pangeo-forge-runner
shell: bash -l {0}
run: |
python -m pip install ${{ matrix.runner-version }}
python -m pip install -e ".[test,minio]"

# order reversed to fix https://github.com/pangeo-forge/pangeo-forge-recipes/pull/595#issuecomment-1811630921
# this should however be fixed in the runner itself
Expand Down
12 changes: 6 additions & 6 deletions .pre-commit-config.yaml
Original file line number Diff line number Diff line change
@@ -1,7 +1,7 @@
repos:

- repo: https://github.com/pre-commit/pre-commit-hooks
rev: v4.4.0
rev: v4.5.0
hooks:
- id: trailing-whitespace
- id: end-of-file-fixer
Expand All @@ -13,13 +13,13 @@ repos:
exclude: "docs/"

- repo: https://github.com/psf/black
rev: 22.12.0
rev: 24.2.0
hooks:
- id: black
args: ["--line-length", "100"]

- repo: https://github.com/PyCQA/flake8
rev: 6.0.0
rev: 7.0.0
hooks:
- id: flake8
exclude: pangeo_forge_recipes/recipes
Expand All @@ -30,18 +30,18 @@ repos:
- id: seed-isort-config

- repo: https://github.com/pre-commit/mirrors-mypy
rev: 'v0.991'
rev: 'v1.8.0'
hooks:
- id: mypy
exclude: tests,pangeo_forge_recipes/recipes

- repo: https://github.com/pycqa/isort
rev: 5.12.0
rev: 5.13.2
hooks:
- id: isort
args: ["--profile", "black"]

- repo: https://github.com/rstcheck/rstcheck
rev: v6.1.1
rev: v6.2.0
hooks:
- id: rstcheck
4 changes: 2 additions & 2 deletions docs/composition/file_patterns.md
Original file line number Diff line number Diff line change
Expand Up @@ -207,8 +207,8 @@ pattern[index]
## From file pattern to `PCollection`

As covered in {doc}`index`, a recipe is composed of a sequence of Apache Beam transforms.
The data collection that Apache Beam transforms operates on is a
[`PCollection`](https://beam.apache.org/documentation/programming-guide/#pcollections).
The data Apache Beam transforms operate on are
[`PCollections`](https://beam.apache.org/documentation/programming-guide/#pcollections).
Therefore, we bring the contents of a `FilePattern` into a recipe, we pass the index:url
pairs generated by the file pattern's ``items()`` method into Beam's `Create` constructor
as follows:
Expand Down
2 changes: 1 addition & 1 deletion docs/composition/index.md
Original file line number Diff line number Diff line change
Expand Up @@ -4,7 +4,7 @@

A recipe describes the steps to transform archival source data in one
format / location into analysis-ready, cloud-optimized (ARCO) data in another format /
location. Technically, a recipe is as a set of composite
location. Technically, a recipe is a composite of
[Apache Beam transforms](https://beam.apache.org/documentation/programming-guide/#transforms)
applied to the data collection associated with a {doc}`file pattern <file_patterns>`.
To write a recipe:
Expand Down
8 changes: 8 additions & 0 deletions docs/composition/styles.md
Original file line number Diff line number Diff line change
Expand Up @@ -24,6 +24,14 @@ the recipe pipeline will contain at a minimum the following transforms applied t
* {class}`pangeo_forge_recipes.transforms.OpenURLWithFSSpec`: retrieves each pattern file using the specified URLs.
* {class}`pangeo_forge_recipes.transforms.OpenWithXarray`: load each pattern file into an [`xarray.Dataset`](https://docs.xarray.dev/en/stable/generated/xarray.Dataset.html).
* {class}`pangeo_forge_recipes.transforms.StoreToZarr`: generate a Zarr store by combining the datasets.
* {class}`pangeo_forge_recipes.transforms.ConsolidateDimensionCoordinates`: consolidate the Dimension Coordinates for dataset read performance.
* {class}`pangeo_forge_recipes.transforms.ConsolidateMetadata`: calls Zarr's convinience function to consolidate metadata.

```{tip}
If using the {class}`pangeo_forge_recipes.transforms.ConsolidateDimensionCoordinates` transform, make sure to chain on the {class}`pangeo_forge_recipes.transforms.ConsolidateMetadata` transform to your recipe.
```


## Open with Kerchunk, write to virtual Zarr

Expand Down
8 changes: 3 additions & 5 deletions docs/contributing.md
Original file line number Diff line number Diff line change
Expand Up @@ -163,10 +163,8 @@ pytest tests -v

## Releasing

To make a new release, first add [](./release_notes.md) for the release to the docs.
Navigate to <https://github.com/pangeo-forge/pangeo-forge-recipes/releases> and click "Draft a new release".

Then just go to <https://github.com/pangeo-forge/pangeo-forge-recipes/releases>
and click "Draft a new release".
![How to release gif](https://github.com/pangeo-forge/pangeo-forge-recipes/assets/15016780/c6132967-4f6d-49d9-96eb-48a687130f97)

The [release.yaml](https://github.com/pangeo-forge/pangeo-forge-recipes/blob/main/.github/workflows/release.yaml)
workflow should take care of the rest.
The [release.yaml](https://github.com/pangeo-forge/pangeo-forge-recipes/blob/main/.github/workflows/release.yaml) will be trigged and will publish the new version of `pangeo-forge-recipes` to pypi.
150 changes: 0 additions & 150 deletions docs/release_notes.md

This file was deleted.

6 changes: 3 additions & 3 deletions docs/requirements.txt
Original file line number Diff line number Diff line change
@@ -1,9 +1,9 @@
sphinx==6.2.1
sphinx==7.2.6
pangeo-sphinx-book-theme==0.2
myst-nb==1.0.0
sphinx-copybutton==0.5.2
sphinx-togglebutton==0.3.2
sphinx-autodoc-typehints==1.23.0
sphinxext-opengraph==0.9.0
sphinx-autodoc-typehints==2.0.0
sphinxext-opengraph==0.9.1
sphinx-design==0.5.0
-e .
10 changes: 9 additions & 1 deletion examples/feedstock/gpcp_from_gcs.py
Original file line number Diff line number Diff line change
Expand Up @@ -3,7 +3,13 @@
import zarr

from pangeo_forge_recipes.patterns import ConcatDim, FilePattern
from pangeo_forge_recipes.transforms import OpenURLWithFSSpec, OpenWithXarray, StoreToZarr
from pangeo_forge_recipes.transforms import (
ConsolidateDimensionCoordinates,
ConsolidateMetadata,
OpenURLWithFSSpec,
OpenWithXarray,
StoreToZarr,
)

dates = [
d.to_pydatetime().strftime("%Y%m%d")
Expand Down Expand Up @@ -43,5 +49,7 @@ def test_ds(store: zarr.storage.FSStore) -> zarr.storage.FSStore:
store_name="gpcp.zarr",
combine_dims=pattern.combine_dim_keys,
)
| ConsolidateDimensionCoordinates()
| ConsolidateMetadata()
| "Test dataset" >> beam.Map(test_ds)
)
6 changes: 3 additions & 3 deletions examples/feedstock/hrrr_kerchunk_concat_step.py
Original file line number Diff line number Diff line change
Expand Up @@ -30,8 +30,9 @@ def test_ds(store: zarr.storage.FSStore) -> zarr.storage.FSStore:

ds = xr.open_dataset(store, engine="zarr", chunks={})
ds = ds.set_coords(("latitude", "longitude"))
assert ds.attrs["centre"] == "kwbc"
assert len(ds["step"]) == 4
ds = ds.expand_dims(dim="time")
assert ds.attrs["GRIB_centre"] == "kwbc"
assert len(ds["step"]) == 2
assert len(ds["time"]) == 1
assert "t" in ds.data_vars
for coord in ["time", "surface", "latitude", "longitude"]:
Expand All @@ -51,7 +52,6 @@ def test_ds(store: zarr.storage.FSStore) -> zarr.storage.FSStore:
store_name="hrrr-concat-step",
concat_dims=pattern.concat_dims,
identical_dims=identical_dims,
precombine_inputs=True,
)
| "Test dataset" >> beam.Map(test_ds)
)
4 changes: 3 additions & 1 deletion examples/feedstock/hrrr_kerchunk_concat_valid_time.py
Original file line number Diff line number Diff line change
Expand Up @@ -2,6 +2,7 @@
https://projectpythia.org/kerchunk-cookbook/notebooks/case_studies/HRRR.html
"""

from typing import Any

import apache_beam as beam
Expand Down Expand Up @@ -64,8 +65,9 @@ def test_ds(store: zarr.storage.FSStore) -> zarr.storage.FSStore:
store_name="hrrr-concat-valid-time",
concat_dims=concat_dims,
identical_dims=identical_dims,
# fails due to: _pickle.PicklingError: Can't pickle <function drop_unknown
# at 0x290e46a70>: attribute lookup drop_unknown on __main__ failed
mzz_kwargs=dict(preprocess=drop_unknown),
precombine_inputs=True,
)
| "Test dataset" >> beam.Map(test_ds)
)
Loading

0 comments on commit 2d0c119

Please sign in to comment.