From ae71437b5517b94be93dc1707479e173b67c7861 Mon Sep 17 00:00:00 2001 From: TomNicholas Date: Sun, 15 Sep 2024 11:15:35 -0400 Subject: [PATCH 01/18] remove too-long underline --- doc/user-guide/hierarchical-data.rst | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/doc/user-guide/hierarchical-data.rst b/doc/user-guide/hierarchical-data.rst index 450daf3f06d..d8c4cd63d3d 100644 --- a/doc/user-guide/hierarchical-data.rst +++ b/doc/user-guide/hierarchical-data.rst @@ -1,7 +1,7 @@ .. _hierarchical-data: Hierarchical data -============================== +================= .. ipython:: python :suppress: From 928767a4f78137bfe65fbf7819bcc14089f6df61 Mon Sep 17 00:00:00 2001 From: TomNicholas Date: Sun, 15 Sep 2024 13:25:13 -0400 Subject: [PATCH 02/18] draft section on data alignment --- doc/user-guide/hierarchical-data.rst | 67 ++++++++++++++++++++++++++++ 1 file changed, 67 insertions(+) diff --git a/doc/user-guide/hierarchical-data.rst b/doc/user-guide/hierarchical-data.rst index d8c4cd63d3d..0491ed85477 100644 --- a/doc/user-guide/hierarchical-data.rst +++ b/doc/user-guide/hierarchical-data.rst @@ -644,3 +644,70 @@ We could use this feature to quickly calculate the electrical power in our signa power = currents * voltages power + +Alignment and Coordinate Inheritance +------------------------------------ + +Data Alignment +~~~~~~~~~~~~~~ + +The data in different datatree nodes are not totally independent. In particular dimensions (and indexes) in child nodes must be aligned (LINK HERE) "vertically", i.e. aligned with those in their parent nodes. + +.. note:: + If you were a previous user of the prototype `xarray-contrib/datatree `_ package, this is different from what you're used to! + In that package the data model was that nodes actually were completely unrelated. The data model is now slightly stricter. + This allows us to provide features like Coordinate Inheritance (LINK BELOW). See the migration guide for more details on the differences (LINK). + +To demonstrate, let's first generate some example datasets which are not aligned with one another + +.. ipython:: python + + da_daily = ds.resample(time="d").mean("time") + da_weekly = ds.resample(time="w").mean("time") + da_monthly = ds.resample(time="ME").mean("time") + +These datasets have different lengths along the time dimension, and are therefore not aligned along that dimension. + +.. ipython:: python + + da.daily.sizes + da.weekly.sizes + da.monthly.sizes + +Whilst we cannot store these non-alignable variables on a single `Dataset` object (DEMONSTRATE THIS?), this could be a good use for `DataTree`! + +However, if we try to create a `DataTree` with these different-length time dimensions present in both parents and children, we will still get an alignment error. + +.. ipython:: python + :okexcept: + + DataTree.from_dict({"daily": ds_daily, "daily/weekly": ds_weekly}) + +This is because DataTree checks alignment up through the tree, all the way to the root. + +.. note:: + This is similar to netCDF's concept of inherited dimensions. + +To represent this unalignable data in a single `DataTree`, we must place all variables which are a function of these different-length dimensions into nodes that are not parents of one another, i.e. organize them as siblings. + +.. ipython:: python + + dt = DataTree.from_dict( + {"daily": ds_daily, "weekly": ds_weekly, "monthly": ds_monthly} + ) + dt + +Now we have a valid `DataTree` structure which contains the data at different time frequencies. + +We is a useful way to organise our data because we can still operate on all the groups at once. +For example we can extract all three timeseries at a specific lat-lon location + +.. ipython:: python + + dt.sel(lat=75, lon=300) + +or find how the standard deviation of these timeseries has been affected by the averaging we did + +.. ipython:: python + + dt.std(dim="time") From 1adb94523f70181efe3dafa75c150cd78a2c4dc9 Mon Sep 17 00:00:00 2001 From: TomNicholas Date: Sun, 15 Sep 2024 14:24:14 -0400 Subject: [PATCH 03/18] fixes --- doc/user-guide/hierarchical-data.rst | 39 +++++++++++++++------------- 1 file changed, 21 insertions(+), 18 deletions(-) diff --git a/doc/user-guide/hierarchical-data.rst b/doc/user-guide/hierarchical-data.rst index 0491ed85477..173324fb543 100644 --- a/doc/user-guide/hierarchical-data.rst +++ b/doc/user-guide/hierarchical-data.rst @@ -651,62 +651,65 @@ Alignment and Coordinate Inheritance Data Alignment ~~~~~~~~~~~~~~ -The data in different datatree nodes are not totally independent. In particular dimensions (and indexes) in child nodes must be aligned (LINK HERE) "vertically", i.e. aligned with those in their parent nodes. +The data in different datatree nodes are not totally independent. In particular dimensions (and indexes) in child nodes must be aligned (LINK HERE) with those in their parent nodes. .. note:: If you were a previous user of the prototype `xarray-contrib/datatree `_ package, this is different from what you're used to! In that package the data model was that nodes actually were completely unrelated. The data model is now slightly stricter. This allows us to provide features like Coordinate Inheritance (LINK BELOW). See the migration guide for more details on the differences (LINK). -To demonstrate, let's first generate some example datasets which are not aligned with one another +To demonstrate, let's first generate some example datasets which are not aligned with one another: .. ipython:: python - da_daily = ds.resample(time="d").mean("time") - da_weekly = ds.resample(time="w").mean("time") - da_monthly = ds.resample(time="ME").mean("time") + # (drop the attributes just to make the printed representation shorter) + ds = xr.tutorial.open_dataset("air_temperature").drop_attrs() -These datasets have different lengths along the time dimension, and are therefore not aligned along that dimension. + ds_daily = ds.resample(time="D").mean("time") + ds_weekly = ds.resample(time="W").mean("time") + ds_monthly = ds.resample(time="ME").mean("time") + +These datasets have different lengths along the ``time`` dimension, and are therefore not aligned along that dimension. .. ipython:: python - da.daily.sizes - da.weekly.sizes - da.monthly.sizes + ds_daily.sizes + ds_weekly.sizes + ds_monthly.sizes -Whilst we cannot store these non-alignable variables on a single `Dataset` object (DEMONSTRATE THIS?), this could be a good use for `DataTree`! +Whilst we cannot store these non-alignable variables on a single :py:class:`~xarray.Dataset` object (DEMONSTRATE THIS?), this could be a good use for :py:class:`~xarray.DataTree`! -However, if we try to create a `DataTree` with these different-length time dimensions present in both parents and children, we will still get an alignment error. +However, if we try to create a :py:class:`~xarray.DataTree` with these different-length time dimensions present in both parents and children, we will still get an alignment error. .. ipython:: python :okexcept: - DataTree.from_dict({"daily": ds_daily, "daily/weekly": ds_weekly}) + xr.DataTree.from_dict({"daily": ds_daily, "daily/weekly": ds_weekly}) This is because DataTree checks alignment up through the tree, all the way to the root. .. note:: This is similar to netCDF's concept of inherited dimensions. -To represent this unalignable data in a single `DataTree`, we must place all variables which are a function of these different-length dimensions into nodes that are not parents of one another, i.e. organize them as siblings. +To represent this unalignable data in a single :py:class:`~xarray.DataTree`, we must instead place all variables which are a function of these different-length dimensions into nodes that are not parents of one another, i.e. organize them as siblings. .. ipython:: python - dt = DataTree.from_dict( + dt = xr.DataTree.from_dict( {"daily": ds_daily, "weekly": ds_weekly, "monthly": ds_monthly} ) dt -Now we have a valid `DataTree` structure which contains the data at different time frequencies. +Now we have a valid :py:class:`~xarray.DataTree` structure which contains the data at different time frequencies. -We is a useful way to organise our data because we can still operate on all the groups at once. -For example we can extract all three timeseries at a specific lat-lon location +This is a useful way to organise our data because we can still operate on all the groups at once. +For example we can extract all three timeseries at a specific lat-lon location: .. ipython:: python dt.sel(lat=75, lon=300) -or find how the standard deviation of these timeseries has been affected by the averaging we did +or compute the standard deviation of each timeseries to find out how it varies with sampling frequency: .. ipython:: python From ae1bcfd8044bf693dc3f94b4badc612a84e8ce81 Mon Sep 17 00:00:00 2001 From: TomNicholas Date: Sun, 15 Sep 2024 14:24:33 -0400 Subject: [PATCH 04/18] draft section on coordinate inheritance --- doc/user-guide/hierarchical-data.rst | 51 ++++++++++++++++++++++++++++ 1 file changed, 51 insertions(+) diff --git a/doc/user-guide/hierarchical-data.rst b/doc/user-guide/hierarchical-data.rst index 173324fb543..d87618c667a 100644 --- a/doc/user-guide/hierarchical-data.rst +++ b/doc/user-guide/hierarchical-data.rst @@ -714,3 +714,54 @@ or compute the standard deviation of each timeseries to find out how it varies w .. ipython:: python dt.std(dim="time") + +Coordinate Inheritance +~~~~~~~~~~~~~~~~~~~~~~ + +Notice that in the tree we constructed above (LINK OR DISPLAY AGAIN?) there is some redundancy - the ``lat`` and ``lon`` variables appear in each sibling group, but are identical in each group. +We can use "Coordinate Inheritance" to define them only once in a parent group and remove this redundancy, whilst still being able to access those coordinate variables from the child groups. + +.. note:: + This is also a new feature relative to the prototype `xarray-contrib/datatree `_ package. + +.. ipython:: python + + dt = xr.DataTree.from_dict( + { + "/": ds.drop_dims("time"), + "daily": ds_daily.drop_vars(["lat", "lon"]), + "weekly": ds_weekly.drop_vars(["lat", "lon"]), + "monthly": ds_monthly.drop_vars(["lat", "lon"]), + } + ) + dt + +We say that the `lat` and `lon` coordinates in the child groups have been "inherited" from their common parent group. + +This is preferred to the previous representation because it makes it clear that all of these datasets share common spatial grid coordinates. +Defining the common coordinates just once also ensures that the spatial coordinates for each group cannot become out of sync with one another during operations. + +We can still access the coordinates defined in the parent groups from any of the child groups as if they were actually present on the child groups: + +.. ipython:: python + + dt.daily.coords + dt["daily/lat"] + +(TODO: the repr of ``dt.coords`` should display which coordinates are inherited) + +If we print just one of the child nodes, it will still display inherited coordinates, but explicitly mark them as such: + +.. ipython:: python + + dt["/daily"] + +We can also still perform all the same operations on the whole tree: + +.. ipython:: python + + dt.sel(lat=75, lon=300) + + dt.std(dim="time") + +EXPLAIN DEDUPLICATION? From f025371d0924b5648b489153fae9e0605d112535 Mon Sep 17 00:00:00 2001 From: TomNicholas Date: Sun, 15 Sep 2024 15:10:00 -0400 Subject: [PATCH 05/18] various improvements --- doc/user-guide/hierarchical-data.rst | 52 +++++++++++++++++++++++----- 1 file changed, 44 insertions(+), 8 deletions(-) diff --git a/doc/user-guide/hierarchical-data.rst b/doc/user-guide/hierarchical-data.rst index d87618c667a..1127fb008aa 100644 --- a/doc/user-guide/hierarchical-data.rst +++ b/doc/user-guide/hierarchical-data.rst @@ -645,9 +645,13 @@ We could use this feature to quickly calculate the electrical power in our signa power = currents * voltages power +.. _alignment and coordinate inheritance: + Alignment and Coordinate Inheritance ------------------------------------ +.. _data alignment: + Data Alignment ~~~~~~~~~~~~~~ @@ -656,7 +660,7 @@ The data in different datatree nodes are not totally independent. In particular .. note:: If you were a previous user of the prototype `xarray-contrib/datatree `_ package, this is different from what you're used to! In that package the data model was that nodes actually were completely unrelated. The data model is now slightly stricter. - This allows us to provide features like Coordinate Inheritance (LINK BELOW). See the migration guide for more details on the differences (LINK). + This allows us to provide features like :ref:`coordinate inheritance`. See the migration guide for more details on the differences (LINK). To demonstrate, let's first generate some example datasets which are not aligned with one another: @@ -677,21 +681,35 @@ These datasets have different lengths along the ``time`` dimension, and are ther ds_weekly.sizes ds_monthly.sizes -Whilst we cannot store these non-alignable variables on a single :py:class:`~xarray.Dataset` object (DEMONSTRATE THIS?), this could be a good use for :py:class:`~xarray.DataTree`! +We cannot store these non-alignable variables on a single :py:class:`~xarray.Dataset` object, because they do not exactly align: + +.. ipython:: python + :okexcept: + + xr.align(ds_daily, ds_weekly, join="exact") -However, if we try to create a :py:class:`~xarray.DataTree` with these different-length time dimensions present in both parents and children, we will still get an alignment error. +But we previously said that multi-resolution data is a good use case for :py:class:`~xarray.DataTree`, so surely we should be able to store these in a single :py:class:`~xarray.DataTree`? +If we first try to create a :py:class:`~xarray.DataTree` with these different-length time dimensions present in both parents and children, we will still get an alignment error: .. ipython:: python :okexcept: xr.DataTree.from_dict({"daily": ds_daily, "daily/weekly": ds_weekly}) -This is because DataTree checks alignment up through the tree, all the way to the root. +(TODO: Looks like this error message could be improved by including information about which sizes are not equal.) + +This is because DataTree checks that data in child nodes align exactly with their parents. .. note:: - This is similar to netCDF's concept of inherited dimensions. + This requirement of aligned dimensions is similar to netCDF's concept of inherited dimensions (LINK TO NETCDF DOCUMENTATION?). + +This alignment check is performed up through the tree, all the way to the root, and so is therefore equivalent to requiring that this :py:func:`xr.align` command succeeds: + +.. code:: + + xr.align(child, *child.parents, join="exact") -To represent this unalignable data in a single :py:class:`~xarray.DataTree`, we must instead place all variables which are a function of these different-length dimensions into nodes that are not parents of one another, i.e. organize them as siblings. +To represent our unalignable data in a single :py:class:`~xarray.DataTree`, we must instead place all variables which are a function of these different-length dimensions into nodes that are not parents of one another, i.e. organize them as siblings. .. ipython:: python @@ -715,6 +733,8 @@ or compute the standard deviation of each timeseries to find out how it varies w dt.std(dim="time") +.. _coordinate inheritance: + Coordinate Inheritance ~~~~~~~~~~~~~~~~~~~~~~ @@ -736,7 +756,7 @@ We can use "Coordinate Inheritance" to define them only once in a parent group a ) dt -We say that the `lat` and `lon` coordinates in the child groups have been "inherited" from their common parent group. +(TODO: They are being displayed in child groups still, see https://github.com/pydata/xarray/issues/9499) This is preferred to the previous representation because it makes it clear that all of these datasets share common spatial grid coordinates. Defining the common coordinates just once also ensures that the spatial coordinates for each group cannot become out of sync with one another during operations. @@ -750,18 +770,34 @@ We can still access the coordinates defined in the parent groups from any of the (TODO: the repr of ``dt.coords`` should display which coordinates are inherited) +As we can still access them, we say that the ``lat`` and ``lon`` coordinates in the child groups have been "inherited" from their common parent group. + If we print just one of the child nodes, it will still display inherited coordinates, but explicitly mark them as such: .. ipython:: python - dt["/daily"] + print(dt["/daily"]) + +This helps to differentiate which variables are defined on the datatree node that you are currently looking at, and which were defined somewhere above it. We can also still perform all the same operations on the whole tree: .. ipython:: python + :okexcept: dt.sel(lat=75, lon=300) dt.std(dim="time") +(TODO: The second one fails due to https://github.com/pydata/xarray/issues/8949) + +Overriding Inherited Coordinates +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ + +You can override inherited coordinates with newly-defined ones, as long as those newly-defined coordinates also align with the parent nodes. + +EXAMPLE OF THIS? WOULD IT MAKE MORE SENSE TO USE DIFFERENT DATA TO DEMONSTRATE THIS? + +EXAMPLE OF INHERITING FROM A GRANDPARENT? + EXPLAIN DEDUPLICATION? From 7549ee917e38cad86f07cb94e1582928fa4f6f18 Mon Sep 17 00:00:00 2001 From: TomNicholas Date: Sun, 15 Sep 2024 15:47:32 -0400 Subject: [PATCH 06/18] more improvements --- doc/user-guide/hierarchical-data.rst | 18 +++++++++++------- 1 file changed, 11 insertions(+), 7 deletions(-) diff --git a/doc/user-guide/hierarchical-data.rst b/doc/user-guide/hierarchical-data.rst index 1127fb008aa..72e74e5f9bd 100644 --- a/doc/user-guide/hierarchical-data.rst +++ b/doc/user-guide/hierarchical-data.rst @@ -645,12 +645,12 @@ We could use this feature to quickly calculate the electrical power in our signa power = currents * voltages power -.. _alignment and coordinate inheritance: +.. _alignment-and-coordinate-inheritance: Alignment and Coordinate Inheritance ------------------------------------ -.. _data alignment: +.. _data-alignment: Data Alignment ~~~~~~~~~~~~~~ @@ -660,7 +660,7 @@ The data in different datatree nodes are not totally independent. In particular .. note:: If you were a previous user of the prototype `xarray-contrib/datatree `_ package, this is different from what you're used to! In that package the data model was that nodes actually were completely unrelated. The data model is now slightly stricter. - This allows us to provide features like :ref:`coordinate inheritance`. See the migration guide for more details on the differences (LINK). + This allows us to provide features like :ref:`coordinate-inheritance`. See the migration guide for more details on the differences (LINK). To demonstrate, let's first generate some example datasets which are not aligned with one another: @@ -703,7 +703,7 @@ This is because DataTree checks that data in child nodes align exactly with thei .. note:: This requirement of aligned dimensions is similar to netCDF's concept of inherited dimensions (LINK TO NETCDF DOCUMENTATION?). -This alignment check is performed up through the tree, all the way to the root, and so is therefore equivalent to requiring that this :py:func:`xr.align` command succeeds: +This alignment check is performed up through the tree, all the way to the root, and so is therefore equivalent to requiring that this :py:func:`~xarray.align` command succeeds: .. code:: @@ -733,17 +733,19 @@ or compute the standard deviation of each timeseries to find out how it varies w dt.std(dim="time") -.. _coordinate inheritance: +.. _coordinate-inheritance: Coordinate Inheritance ~~~~~~~~~~~~~~~~~~~~~~ -Notice that in the tree we constructed above (LINK OR DISPLAY AGAIN?) there is some redundancy - the ``lat`` and ``lon`` variables appear in each sibling group, but are identical in each group. +Notice that in the trees we constructed above (LINK OR DISPLAY AGAIN?) there is some redundancy - the ``lat`` and ``lon`` variables appear in each sibling group, but are identical across the groups. We can use "Coordinate Inheritance" to define them only once in a parent group and remove this redundancy, whilst still being able to access those coordinate variables from the child groups. .. note:: This is also a new feature relative to the prototype `xarray-contrib/datatree `_ package. +Let's instead place only the time-dependent variables in the child groups, and put the non-time-dependent ``lat`` and ``lon`` variables in the parent (root) group: + .. ipython:: python dt = xr.DataTree.from_dict( @@ -758,7 +760,7 @@ We can use "Coordinate Inheritance" to define them only once in a parent group a (TODO: They are being displayed in child groups still, see https://github.com/pydata/xarray/issues/9499) -This is preferred to the previous representation because it makes it clear that all of these datasets share common spatial grid coordinates. +This is preferred to the previous representation because it now makes it clear that all of these datasets share common spatial grid coordinates. Defining the common coordinates just once also ensures that the spatial coordinates for each group cannot become out of sync with one another during operations. We can still access the coordinates defined in the parent groups from any of the child groups as if they were actually present on the child groups: @@ -791,6 +793,8 @@ We can also still perform all the same operations on the whole tree: (TODO: The second one fails due to https://github.com/pydata/xarray/issues/8949) +.. _overriding-inherited-coordinates: + Overriding Inherited Coordinates ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ From b6316971e136d19376878e8e5c09a1927cbb4688 Mon Sep 17 00:00:00 2001 From: TomNicholas Date: Sun, 15 Sep 2024 15:47:46 -0400 Subject: [PATCH 07/18] link from other page --- doc/user-guide/data-structures.rst | 1 + 1 file changed, 1 insertion(+) diff --git a/doc/user-guide/data-structures.rst b/doc/user-guide/data-structures.rst index b5e83789806..eb04500f22d 100644 --- a/doc/user-guide/data-structures.rst +++ b/doc/user-guide/data-structures.rst @@ -800,6 +800,7 @@ included by default unless you exclude them with the ``inherited`` flag: dt2["/weather/temperature"].to_dataset(inherited=False) +For more examples and further discussion see LINK .. _coordinates: From 02bf96b2108def56613dc84b5fb7057a519b8810 Mon Sep 17 00:00:00 2001 From: TomNicholas Date: Sun, 15 Sep 2024 15:52:05 -0400 Subject: [PATCH 08/18] align call include all 3 datasets --- doc/user-guide/hierarchical-data.rst | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/doc/user-guide/hierarchical-data.rst b/doc/user-guide/hierarchical-data.rst index 72e74e5f9bd..ef90fe96b2c 100644 --- a/doc/user-guide/hierarchical-data.rst +++ b/doc/user-guide/hierarchical-data.rst @@ -686,7 +686,7 @@ We cannot store these non-alignable variables on a single :py:class:`~xarray.Dat .. ipython:: python :okexcept: - xr.align(ds_daily, ds_weekly, join="exact") + xr.align(ds_daily, ds_weekly, ds_monthly, join="exact") But we previously said that multi-resolution data is a good use case for :py:class:`~xarray.DataTree`, so surely we should be able to store these in a single :py:class:`~xarray.DataTree`? If we first try to create a :py:class:`~xarray.DataTree` with these different-length time dimensions present in both parents and children, we will still get an alignment error: From 152d74a8e0f7651a9419be551b98109d187f3610 Mon Sep 17 00:00:00 2001 From: TomNicholas Date: Sun, 15 Sep 2024 15:55:46 -0400 Subject: [PATCH 09/18] link back to use cases --- doc/user-guide/hierarchical-data.rst | 4 +++- 1 file changed, 3 insertions(+), 1 deletion(-) diff --git a/doc/user-guide/hierarchical-data.rst b/doc/user-guide/hierarchical-data.rst index ef90fe96b2c..ce96b99c8bf 100644 --- a/doc/user-guide/hierarchical-data.rst +++ b/doc/user-guide/hierarchical-data.rst @@ -15,6 +15,8 @@ Hierarchical data %xmode minimal +.. _why: + Why Hierarchical Data? ---------------------- @@ -688,7 +690,7 @@ We cannot store these non-alignable variables on a single :py:class:`~xarray.Dat xr.align(ds_daily, ds_weekly, ds_monthly, join="exact") -But we previously said that multi-resolution data is a good use case for :py:class:`~xarray.DataTree`, so surely we should be able to store these in a single :py:class:`~xarray.DataTree`? +But we :ref:`previously said ` that multi-resolution data is a good use case for :py:class:`~xarray.DataTree`, so surely we should be able to store these in a single :py:class:`~xarray.DataTree`? If we first try to create a :py:class:`~xarray.DataTree` with these different-length time dimensions present in both parents and children, we will still get an alignment error: .. ipython:: python From 57b7f062c7febe42a2c96abf962d1a86127bee7f Mon Sep 17 00:00:00 2001 From: TomNicholas Date: Sun, 15 Sep 2024 15:56:26 -0400 Subject: [PATCH 10/18] clarification --- doc/user-guide/hierarchical-data.rst | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/doc/user-guide/hierarchical-data.rst b/doc/user-guide/hierarchical-data.rst index ce96b99c8bf..3eda99f6669 100644 --- a/doc/user-guide/hierarchical-data.rst +++ b/doc/user-guide/hierarchical-data.rst @@ -711,7 +711,7 @@ This alignment check is performed up through the tree, all the way to the root, xr.align(child, *child.parents, join="exact") -To represent our unalignable data in a single :py:class:`~xarray.DataTree`, we must instead place all variables which are a function of these different-length dimensions into nodes that are not parents of one another, i.e. organize them as siblings. +To represent our unalignable data in a single :py:class:`~xarray.DataTree`, we must instead place all variables which are a function of these different-length dimensions into nodes that are not direct descendents of one another, e.g. organize them as siblings. .. ipython:: python From d3ac1a7b8a30e9c269383904b7e7d838397fe79b Mon Sep 17 00:00:00 2001 From: TomNicholas Date: Sun, 15 Sep 2024 16:07:02 -0400 Subject: [PATCH 11/18] small improvements --- doc/user-guide/hierarchical-data.rst | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-) diff --git a/doc/user-guide/hierarchical-data.rst b/doc/user-guide/hierarchical-data.rst index 3eda99f6669..90cb286ddca 100644 --- a/doc/user-guide/hierarchical-data.rst +++ b/doc/user-guide/hierarchical-data.rst @@ -720,7 +720,7 @@ To represent our unalignable data in a single :py:class:`~xarray.DataTree`, we m ) dt -Now we have a valid :py:class:`~xarray.DataTree` structure which contains the data at different time frequencies. +Now we have a valid :py:class:`~xarray.DataTree` structure which contains all the data at each different time frequency, stored in a separate group. This is a useful way to organise our data because we can still operate on all the groups at once. For example we can extract all three timeseries at a specific lat-lon location: @@ -741,6 +741,7 @@ Coordinate Inheritance ~~~~~~~~~~~~~~~~~~~~~~ Notice that in the trees we constructed above (LINK OR DISPLAY AGAIN?) there is some redundancy - the ``lat`` and ``lon`` variables appear in each sibling group, but are identical across the groups. + We can use "Coordinate Inheritance" to define them only once in a parent group and remove this redundancy, whilst still being able to access those coordinate variables from the child groups. .. note:: From d73dd8ab2f605618ca1d6e6c7923d9a86a6cd647 Mon Sep 17 00:00:00 2001 From: TomNicholas Date: Mon, 23 Sep 2024 10:50:09 -0400 Subject: [PATCH 12/18] remove TODO after #9532 --- doc/user-guide/hierarchical-data.rst | 2 -- 1 file changed, 2 deletions(-) diff --git a/doc/user-guide/hierarchical-data.rst b/doc/user-guide/hierarchical-data.rst index 90cb286ddca..d6e3c30fff6 100644 --- a/doc/user-guide/hierarchical-data.rst +++ b/doc/user-guide/hierarchical-data.rst @@ -761,8 +761,6 @@ Let's instead place only the time-dependent variables in the child groups, and p ) dt -(TODO: They are being displayed in child groups still, see https://github.com/pydata/xarray/issues/9499) - This is preferred to the previous representation because it now makes it clear that all of these datasets share common spatial grid coordinates. Defining the common coordinates just once also ensures that the spatial coordinates for each group cannot become out of sync with one another during operations. From d779e22d75d207470f727fa631e1298369960c44 Mon Sep 17 00:00:00 2001 From: TomNicholas Date: Mon, 23 Sep 2024 10:52:24 -0400 Subject: [PATCH 13/18] add todo about #9475 --- doc/user-guide/hierarchical-data.rst | 2 ++ 1 file changed, 2 insertions(+) diff --git a/doc/user-guide/hierarchical-data.rst b/doc/user-guide/hierarchical-data.rst index d6e3c30fff6..079f3381431 100644 --- a/doc/user-guide/hierarchical-data.rst +++ b/doc/user-guide/hierarchical-data.rst @@ -792,6 +792,8 @@ We can also still perform all the same operations on the whole tree: dt.std(dim="time") +(TODO: The first one repeats coordinates in the result due to https://github.com/pydata/xarray/issues/9475) + (TODO: The second one fails due to https://github.com/pydata/xarray/issues/8949) .. _overriding-inherited-coordinates: From 3c9ad5519877365132d5b87361395027b3c183d5 Mon Sep 17 00:00:00 2001 From: TomNicholas Date: Mon, 23 Sep 2024 11:04:12 -0400 Subject: [PATCH 14/18] correct xr.align example call --- doc/user-guide/hierarchical-data.rst | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/doc/user-guide/hierarchical-data.rst b/doc/user-guide/hierarchical-data.rst index 079f3381431..86717fe965d 100644 --- a/doc/user-guide/hierarchical-data.rst +++ b/doc/user-guide/hierarchical-data.rst @@ -707,9 +707,9 @@ This is because DataTree checks that data in child nodes align exactly with thei This alignment check is performed up through the tree, all the way to the root, and so is therefore equivalent to requiring that this :py:func:`~xarray.align` command succeeds: -.. code:: +.. code:: python - xr.align(child, *child.parents, join="exact") + xr.align(child.dataset, parent.dataset for parent in child.parents, join="exact") To represent our unalignable data in a single :py:class:`~xarray.DataTree`, we must instead place all variables which are a function of these different-length dimensions into nodes that are not direct descendents of one another, e.g. organize them as siblings. From 4cee745be81645304215b4958ce2959ebceb51a3 Mon Sep 17 00:00:00 2001 From: TomNicholas Date: Mon, 23 Sep 2024 11:20:26 -0400 Subject: [PATCH 15/18] add links to netCDF4 documentation --- doc/user-guide/hierarchical-data.rst | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/doc/user-guide/hierarchical-data.rst b/doc/user-guide/hierarchical-data.rst index 48d2bb1edae..a1f0f578381 100644 --- a/doc/user-guide/hierarchical-data.rst +++ b/doc/user-guide/hierarchical-data.rst @@ -703,7 +703,7 @@ If we first try to create a :py:class:`~xarray.DataTree` with these different-le This is because DataTree checks that data in child nodes align exactly with their parents. .. note:: - This requirement of aligned dimensions is similar to netCDF's concept of inherited dimensions (LINK TO NETCDF DOCUMENTATION?). + This requirement of aligned dimensions is similar to netCDF's concept of `inherited dimensions `_, as in netCDF-4 files dimensions are `visible to all child groups `_. This alignment check is performed up through the tree, all the way to the root, and so is therefore equivalent to requiring that this :py:func:`~xarray.align` command succeeds: From 4c030d84c84d70bfa309355d6e76d1d2c2c98a65 Mon Sep 17 00:00:00 2001 From: Tom Nicholas Date: Mon, 23 Sep 2024 09:22:14 -0600 Subject: [PATCH 16/18] Consistent voice Co-authored-by: Maximilian Roos <5635139+max-sixty@users.noreply.github.com> --- doc/user-guide/hierarchical-data.rst | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/doc/user-guide/hierarchical-data.rst b/doc/user-guide/hierarchical-data.rst index a1f0f578381..f042fa00e75 100644 --- a/doc/user-guide/hierarchical-data.rst +++ b/doc/user-guide/hierarchical-data.rst @@ -801,7 +801,7 @@ We can also still perform all the same operations on the whole tree: Overriding Inherited Coordinates ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ -You can override inherited coordinates with newly-defined ones, as long as those newly-defined coordinates also align with the parent nodes. +We can override inherited coordinates with newly-defined ones, as long as those newly-defined coordinates also align with the parent nodes. EXAMPLE OF THIS? WOULD IT MAKE MORE SENSE TO USE DIFFERENT DATA TO DEMONSTRATE THIS? From 6db4a0b74a54f468557d58e456842c3914d28c18 Mon Sep 17 00:00:00 2001 From: TomNicholas Date: Sun, 6 Oct 2024 12:32:37 -0400 Subject: [PATCH 17/18] keep indexes in lat lon selection to dodge #9475 --- doc/user-guide/hierarchical-data.rst | 4 +--- 1 file changed, 1 insertion(+), 3 deletions(-) diff --git a/doc/user-guide/hierarchical-data.rst b/doc/user-guide/hierarchical-data.rst index a1f0f578381..5f11e1e762e 100644 --- a/doc/user-guide/hierarchical-data.rst +++ b/doc/user-guide/hierarchical-data.rst @@ -788,12 +788,10 @@ We can also still perform all the same operations on the whole tree: .. ipython:: python :okexcept: - dt.sel(lat=75, lon=300) + dt.sel(lat=[75], lon=[300]) dt.std(dim="time") -(TODO: The first one repeats coordinates in the result due to https://github.com/pydata/xarray/issues/9475) - (TODO: The second one fails due to https://github.com/pydata/xarray/issues/8949) .. _overriding-inherited-coordinates: From e879dbb9d5522d2a7e46ca9d925bd1c3a3271096 Mon Sep 17 00:00:00 2001 From: Tom Nicholas Date: Sun, 6 Oct 2024 10:36:54 -0600 Subject: [PATCH 18/18] unpack generator properly Co-authored-by: Stephan Hoyer --- doc/user-guide/hierarchical-data.rst | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/doc/user-guide/hierarchical-data.rst b/doc/user-guide/hierarchical-data.rst index 0716f3cd941..f130bc06397 100644 --- a/doc/user-guide/hierarchical-data.rst +++ b/doc/user-guide/hierarchical-data.rst @@ -709,7 +709,7 @@ This alignment check is performed up through the tree, all the way to the root, .. code:: python - xr.align(child.dataset, parent.dataset for parent in child.parents, join="exact") + xr.align(child.dataset, *(parent.dataset for parent in child.parents), join="exact") To represent our unalignable data in a single :py:class:`~xarray.DataTree`, we must instead place all variables which are a function of these different-length dimensions into nodes that are not direct descendents of one another, e.g. organize them as siblings.