Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Shallow copy parent and children in DataTree constructor #9297

Merged

Conversation

TomNicholas
Copy link
Member

This seems to fix the examples in #9196, but causes other tests to fail, which I think just reveals that we have existing tests that assume this in-place modification is desired. I will change those tests to match the new intended behaviour.

@TomNicholas TomNicholas added the topic-DataTree Related to the implementation of a DataTree class label Jul 31, 2024
@TomNicholas TomNicholas marked this pull request as ready for review July 31, 2024 06:07
@TomNicholas TomNicholas marked this pull request as draft July 31, 2024 06:08
self.children = children

# shallow copy to avoid modifying arguments in-place (see GH issue #9196)
self.parent = parent.copy() if parent is not None else None
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we remove parent from the constructor instead?

It's a little weird to only be able to set the direct parent (and corresponding sibling), but not more distant relatives.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah I was going to do that in a follow-up PR, but I can do it in this one instead.

@TomNicholas
Copy link
Member Author

TomNicholas commented Aug 2, 2024

The main delay here is just how some of the tests are written - a lot of them rely upon in-place modification to work. e.g.

def test_path_property(self):

This change makes the constructor a somewhat unhelpful way to create trees from scratch. I could rewrite these tests either

  1. The ugly way I have started in 5db25bd, which doesn't refer to copied objects,

  2. Using .from_dict, which is much clearer (and the recommended user API), but relies on more complex code paths internally so doesn't isolate the behaviour of the constructor / relationship properties so much.

@flamingbear
Copy link
Contributor

  1. The ugly way I have started in 5db25bd, which doesn't refer to copied objects,

When you say it's ugly, it's what I was thinking. It seems closer to what you're testing.
This looks like a not super useful test.

john = DataTree.from_dict(
    {
        "/": DataTree(name="John"),
        "/Mary": DataTree(name="Mary"),
        "/Mary/Sue": DataTree(name="Sue"),
    }
)
assert john["/Mary/Sue"].path == "/Mary/Sue"
 

@shoyer
Copy link
Member

shoyer commented Aug 2, 2024

I think using from_dict for tests is fine, as long as you thoroughly test the constructor as well.

You could also avoid all the intermediate variables by writing something like:

john = DataTree(children={"Mary": DataTree(children={"Sue": DataTree()})})

Incidentally, it would sure be nice if we could eventually support automatic unpacking of nested dictionaries in the DataTree constructor, e.g., something like:

john = DataTree({"Mary": {"Sue": {}}})

Here the nature of the node (child DataTree vs DataArray) could be inferred from the argument type.

@shoyer
Copy link
Member

shoyer commented Aug 2, 2024

john = DataTree.from_dict(
    {
        "/": DataTree(name="John"),
        "/Mary": DataTree(name="Mary"),
        "/Mary/Sue": DataTree(name="Sue"),
    }
)
assert john["/Mary/Sue"].path == "/Mary/Sue"
 ``

Incidentally, I would try to avoid redundant name arguments, given how name is (or should be) overriden by the assigned key in the parent object. So I would wrtie this as:

john = DataTree.from_dict(
    {
        "/": DataTree(),
        "/Mary": DataTree(),
        "/Mary/Sue": DataTree(),
    }
)

or even just

john = DataTree.from_dict(
    {
        "/Mary/Sue": DataTree(),
    }
)

@TomNicholas
Copy link
Member Author

I've now re-written the tests to use approach (2), i.e. to just use DataTree.from_dict everywhere that a nested tree is required.

@TomNicholas TomNicholas marked this pull request as ready for review August 27, 2024 15:34
flamingbear and others added 7 commits September 2, 2024 14:38
I will happily take something better.

But this was the error I was getting

xarray/tests/test_datatree.py:127: error: Argument 1 to "relative_to" of
"NamedNode" has incompatible type "DataTree[Any] | DataArray"; expected
"NamedNode[Any]"  [arg-type]
But it doesn't fix all the tests There's three tests that I don't fully know
what should be tested or if they still make sense.
@flamingbear
Copy link
Contributor

flamingbear commented Sep 3, 2024

@TomNicholas I didn't want to push up unfinished work to this PR again. But 7c530f5?diff=split&w=0
is my stab at removing the parent parameter from the DataTree Constructor. I got all but three tests to pass but it seems like the three might not make sense anymore (or I just don't fully understand them)

FAILED xarray/tests/test_datatree.py::TestFamilyTree::test_create_two_children - TypeError: DataTree.__init__() got an unexpected keyword argument 'parent'
FAILED xarray/tests/test_datatree.py::TestFamilyTree::test_dont_modify_parent_inplace - TypeError: DataTree.__init__() got an unexpected keyword argument 'parent'
FAILED xarray/tests/test_datatree.py::TestFamilyTree::test_setparent_unnamed_child_node_fails - TypeError: DataTree.__init__() got an unexpected keyword argument 'parent'

Update edited:
test_dont_modify_parent_inplace looks like might have a bug. And probably something I didn't fix when I removed the parent keyword from the constructor?
fe5ae9c

I think this is a test for test_setparent_unnamed_child_node_fails
9a004d4

still have no idea what test_create_two_children is doing

these are just 3 commits on https://github.com/flamingbear/xarray/tree/datatree_init_dont_modify_inplace

@TomNicholas
Copy link
Member Author

I think these are all resolved by either:

  1. Changing outdated tests
  2. Also making .parent read-only.

@TomNicholas
Copy link
Member Author

This PR now does a few related things:

  1. Shallow copy children in the DataTree constructor (6c56b12),
  2. Removes the parent argument from the DataTree constructor (7c530f5)
  3. Makes the .parent attribute of DataTree read-only (897b589)
  4. Changes any tests that these effect by making them pass or deleting them.

@TomNicholas
Copy link
Member Author

I believe this is ready to merge. It might imply more changes to @flamingbear 's docs PR though.

@shoyer
Copy link
Member

shoyer commented Sep 7, 2024

Looks great, thanks Tom!

@TomNicholas TomNicholas merged commit 68b040a into pydata:main Sep 7, 2024
28 checks passed
@TomNicholas TomNicholas deleted the datatree_init_dont_modify_inplace branch September 7, 2024 18:52
dcherian added a commit to dcherian/xarray that referenced this pull request Sep 17, 2024
* main: (29 commits)
  Release notes for v2024.09.0 (pydata#9480)
  Fix `DataTree.coords.__setitem__` by adding `DataTreeCoordinates` class (pydata#9451)
  Rename DataTree's "ds" and "data" to "dataset" (pydata#9476)
  Update DataTree repr to indicate inheritance (pydata#9470)
  Bump pypa/gh-action-pypi-publish in the actions group (pydata#9460)
  Repo checker (pydata#9450)
  Add days_in_year and decimal_year to dt accessor (pydata#9105)
  remove parent argument from DataTree.__init__ (pydata#9465)
  Fix inheritance in DataTree.copy() (pydata#9457)
  Implement `DataTree.__delitem__` (pydata#9453)
  Add ASV for datatree.from_dict (pydata#9459)
  Make the first argument in DataTree.from_dict positional only (pydata#9446)
  Fix typos across the code, doc and comments (pydata#9443)
  DataTree should not be "Generic" (pydata#9445)
  Disallow passing a DataArray as data into the DataTree constructor (pydata#9444)
  Support additional dtypes in `resample` (pydata#9413)
  Shallow copy parent and children in DataTree constructor (pydata#9297)
  Bump minimum versions for dependencies (pydata#9434)
  Always include at least one category in random test data (pydata#9436)
  Avoid deep-copy when constructing groupby codes (pydata#9429)
  ...
hollymandel pushed a commit to hollymandel/xarray that referenced this pull request Sep 23, 2024
* add tests

* fix by shallow copying

* correct first few tests

* replace constructors in tests with DataTree.from_dict

* rewrite simple_datatree fixture to use DataTree.from_dict

* fix incorrect creation of nested tree in formatting test

* Update doctests for from_dict constructor

* swap i and h in doctest example for clarity.

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Fix a few mypy errors.

* Bonkers way to set type checking

I will happily take something better.

But this was the error I was getting

xarray/tests/test_datatree.py:127: error: Argument 1 to "relative_to" of
"NamedNode" has incompatible type "DataTree[Any] | DataArray"; expected
"NamedNode[Any]"  [arg-type]

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Removes parent keyword from DataTree constructor

But it doesn't fix all the tests There's three tests that I don't fully know
what should be tested or if they still make sense.

* fix test_setparent_unnamed_child_node_fails

* fix test_dont_modify_parent_inplace -> bug?

* fix test_create_two_children

* make .parent read-only, and remove tests which test the parent setter

* update error message to reflect fact that .children is Frozen

* fix another test

* add test that parent setter tells you to set children instead

* fix mypy error due to overriding settable property with read-only property

* fix test by not trying to set parent via kwarg

---------

Co-authored-by: Matt Savoie <matthew.savoie@colorado.edu>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
topic-DataTree Related to the implementation of a DataTree class
Projects
Development

Successfully merging this pull request may close these issues.

3 participants