Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix multidim rechunking _invert_meshgrid error #658

Closed

Conversation

ta-hill
Copy link
Contributor

@ta-hill ta-hill commented Nov 30, 2023

Hello, I'd like to suggest adding the following test to test_rechunking.py. The purpose of the the additional test cases is to test when different amounts of dimensions are passed as "target_chunks" to split_fragment and subsequently combined. Particularly when 3 or more chunking dimensions are used as target_chunks, several cases will fail with the same error: "ValueError: Cannot combine fragments because they do not form a regular hypercube.". @cisaacstern confirmed that these cases aren't covered elsewhere. I've attached the report from running the tests.
I tried testing these cases with the failing portion in combine_fragments commented out,

    starts_cube = [np.array(item[1]).reshape(shape) for item in dims_starts_sizes]
    sizes_cube = [np.array(item[2]).reshape(shape) for item in dims_starts_sizes]
    try:
        # reversing order is necessary here because _sort_by_speed_of_varying puts the
        # arrays into the opposite order as wanted by np.meshgrid
        starts = _invert_meshgrid(*starts_cube[::-1])[::-1]
        sizes = _invert_meshgrid(*sizes_cube[::-1])[::-1]
    except AssertionError:
        raise ValueError("Cannot combine fragments because they do not form a regular hypercube.")


    expected_sizes = [np.diff(s) for s in starts]
    if not all(np.equal(s[:-1], es).all() for s, es in zip(sizes, expected_sizes)):
        raise ValueError(f"Dataset {sizes} and index starts {starts} are not consistent.")

and all of the tests passed. I also tested in a full end-to-end case and the output was identical to the input datasets.

Test report output:

test_rechunking.py ..............F.FF                                                                                                                                                                  [100%]

================================================================================================== FAILURES ==================================================================================================
_______________________________________________________________ test_split_and_combine_fragments_with_merge_dim[other_chunks2-2-nt_dayparam0] ________________________________________________________________
../pangeo_forge_recipes/rechunking.py:222: in combine_fragments
    starts = _invert_meshgrid(*starts_cube[::-1])[::-1]
../pangeo_forge_recipes/rechunking.py:147: in _invert_meshgrid
    assert all(
E   AssertionError

During handling of the above exception, another exception occurred:
test_rechunking.py:72: in test_split_and_combine_fragments_with_merge_dim
    _, ds_combined = combine_fragments(
../pangeo_forge_recipes/rechunking.py:225: in combine_fragments
    raise ValueError("Cannot combine fragments because they do not form a regular hypercube.")
E   ValueError: Cannot combine fragments because they do not form a regular hypercube.
-------------------------------------------------------------------------------------------- Captured stdout call --------------------------------------------------------------------------------------------
running with args: nt_dayparam = (5, '1D') time_chunks = 2 other_chunks ={'lat': 5, 'lon': 5}
_______________________________________________________________ test_split_and_combine_fragments_with_merge_dim[other_chunks2-5-nt_dayparam0] ________________________________________________________________
../pangeo_forge_recipes/rechunking.py:222: in combine_fragments
    starts = _invert_meshgrid(*starts_cube[::-1])[::-1]
../pangeo_forge_recipes/rechunking.py:147: in _invert_meshgrid
    assert all(
E   AssertionError

During handling of the above exception, another exception occurred:
test_rechunking.py:72: in test_split_and_combine_fragments_with_merge_dim
    _, ds_combined = combine_fragments(
../pangeo_forge_recipes/rechunking.py:225: in combine_fragments
    raise ValueError("Cannot combine fragments because they do not form a regular hypercube.")
E   ValueError: Cannot combine fragments because they do not form a regular hypercube.
-------------------------------------------------------------------------------------------- Captured stdout call --------------------------------------------------------------------------------------------
running with args: nt_dayparam = (5, '1D') time_chunks = 5 other_chunks ={'lat': 5, 'lon': 5}
_______________________________________________________________ test_split_and_combine_fragments_with_merge_dim[other_chunks2-5-nt_dayparam1] ________________________________________________________________
../pangeo_forge_recipes/rechunking.py:222: in combine_fragments
    starts = _invert_meshgrid(*starts_cube[::-1])[::-1]
../pangeo_forge_recipes/rechunking.py:147: in _invert_meshgrid
    assert all(
E   AssertionError

During handling of the above exception, another exception occurred:
test_rechunking.py:72: in test_split_and_combine_fragments_with_merge_dim
    _, ds_combined = combine_fragments(
../pangeo_forge_recipes/rechunking.py:225: in combine_fragments
    raise ValueError("Cannot combine fragments because they do not form a regular hypercube.")
E   ValueError: Cannot combine fragments because they do not form a regular hypercube.
-------------------------------------------------------------------------------------------- Captured stdout call --------------------------------------------------------------------------------------------
running with args: nt_dayparam = (10, '2D') time_chunks = 5 other_chunks ={'lat': 5, 'lon': 5}
=================================================================================================== PASSES ===================================================================================================
_______________________________________________________________ test_split_and_combine_fragments_with_merge_dim[other_chunks0-1-nt_dayparam0] ________________________________________________________________
-------------------------------------------------------------------------------------------- Captured stdout call --------------------------------------------------------------------------------------------
running with args: nt_dayparam = (5, '1D') time_chunks = 1 other_chunks ={}
_______________________________________________________________ test_split_and_combine_fragments_with_merge_dim[other_chunks0-1-nt_dayparam1] ________________________________________________________________
-------------------------------------------------------------------------------------------- Captured stdout call --------------------------------------------------------------------------------------------
running with args: nt_dayparam = (10, '2D') time_chunks = 1 other_chunks ={}
_______________________________________________________________ test_split_and_combine_fragments_with_merge_dim[other_chunks0-2-nt_dayparam0] ________________________________________________________________
-------------------------------------------------------------------------------------------- Captured stdout call --------------------------------------------------------------------------------------------
running with args: nt_dayparam = (5, '1D') time_chunks = 2 other_chunks ={}
_______________________________________________________________ test_split_and_combine_fragments_with_merge_dim[other_chunks0-2-nt_dayparam1] ________________________________________________________________
-------------------------------------------------------------------------------------------- Captured stdout call --------------------------------------------------------------------------------------------
running with args: nt_dayparam = (10, '2D') time_chunks = 2 other_chunks ={}
_______________________________________________________________ test_split_and_combine_fragments_with_merge_dim[other_chunks0-5-nt_dayparam0] ________________________________________________________________
-------------------------------------------------------------------------------------------- Captured stdout call --------------------------------------------------------------------------------------------
running with args: nt_dayparam = (5, '1D') time_chunks = 5 other_chunks ={}
_______________________________________________________________ test_split_and_combine_fragments_with_merge_dim[other_chunks0-5-nt_dayparam1] ________________________________________________________________
-------------------------------------------------------------------------------------------- Captured stdout call --------------------------------------------------------------------------------------------
running with args: nt_dayparam = (10, '2D') time_chunks = 5 other_chunks ={}
_______________________________________________________________ test_split_and_combine_fragments_with_merge_dim[other_chunks1-1-nt_dayparam0] ________________________________________________________________
-------------------------------------------------------------------------------------------- Captured stdout call --------------------------------------------------------------------------------------------
running with args: nt_dayparam = (5, '1D') time_chunks = 1 other_chunks ={'lat': 5}
_______________________________________________________________ test_split_and_combine_fragments_with_merge_dim[other_chunks1-1-nt_dayparam1] ________________________________________________________________
-------------------------------------------------------------------------------------------- Captured stdout call --------------------------------------------------------------------------------------------
running with args: nt_dayparam = (10, '2D') time_chunks = 1 other_chunks ={'lat': 5}
_______________________________________________________________ test_split_and_combine_fragments_with_merge_dim[other_chunks1-2-nt_dayparam0] ________________________________________________________________
-------------------------------------------------------------------------------------------- Captured stdout call --------------------------------------------------------------------------------------------
running with args: nt_dayparam = (5, '1D') time_chunks = 2 other_chunks ={'lat': 5}
_______________________________________________________________ test_split_and_combine_fragments_with_merge_dim[other_chunks1-2-nt_dayparam1] ________________________________________________________________
-------------------------------------------------------------------------------------------- Captured stdout call --------------------------------------------------------------------------------------------
running with args: nt_dayparam = (10, '2D') time_chunks = 2 other_chunks ={'lat': 5}
_______________________________________________________________ test_split_and_combine_fragments_with_merge_dim[other_chunks1-5-nt_dayparam0] ________________________________________________________________
-------------------------------------------------------------------------------------------- Captured stdout call --------------------------------------------------------------------------------------------
running with args: nt_dayparam = (5, '1D') time_chunks = 5 other_chunks ={'lat': 5}
_______________________________________________________________ test_split_and_combine_fragments_with_merge_dim[other_chunks1-5-nt_dayparam1] ________________________________________________________________
-------------------------------------------------------------------------------------------- Captured stdout call --------------------------------------------------------------------------------------------
running with args: nt_dayparam = (10, '2D') time_chunks = 5 other_chunks ={'lat': 5}
_______________________________________________________________ test_split_and_combine_fragments_with_merge_dim[other_chunks2-1-nt_dayparam0] ________________________________________________________________
-------------------------------------------------------------------------------------------- Captured stdout call --------------------------------------------------------------------------------------------
running with args: nt_dayparam = (5, '1D') time_chunks = 1 other_chunks ={'lat': 5, 'lon': 5}
_______________________________________________________________ test_split_and_combine_fragments_with_merge_dim[other_chunks2-1-nt_dayparam1] ________________________________________________________________
-------------------------------------------------------------------------------------------- Captured stdout call --------------------------------------------------------------------------------------------
running with args: nt_dayparam = (10, '2D') time_chunks = 1 other_chunks ={'lat': 5, 'lon': 5}
_______________________________________________________________ test_split_and_combine_fragments_with_merge_dim[other_chunks2-2-nt_dayparam1] ________________________________________________________________
-------------------------------------------------------------------------------------------- Captured stdout call --------------------------------------------------------------------------------------------
running with args: nt_dayparam = (10, '2D') time_chunks = 2 other_chunks ={'lat': 5, 'lon': 5}
========================================================================================== short test summary info ===========================================================================================
PASSED test_rechunking.py::test_split_and_combine_fragments_with_merge_dim[other_chunks0-1-nt_dayparam0]
PASSED test_rechunking.py::test_split_and_combine_fragments_with_merge_dim[other_chunks0-1-nt_dayparam1]
PASSED test_rechunking.py::test_split_and_combine_fragments_with_merge_dim[other_chunks0-2-nt_dayparam0]
PASSED test_rechunking.py::test_split_and_combine_fragments_with_merge_dim[other_chunks0-2-nt_dayparam1]
PASSED test_rechunking.py::test_split_and_combine_fragments_with_merge_dim[other_chunks0-5-nt_dayparam0]
PASSED test_rechunking.py::test_split_and_combine_fragments_with_merge_dim[other_chunks0-5-nt_dayparam1]
PASSED test_rechunking.py::test_split_and_combine_fragments_with_merge_dim[other_chunks1-1-nt_dayparam0]
PASSED test_rechunking.py::test_split_and_combine_fragments_with_merge_dim[other_chunks1-1-nt_dayparam1]
PASSED test_rechunking.py::test_split_and_combine_fragments_with_merge_dim[other_chunks1-2-nt_dayparam0]
PASSED test_rechunking.py::test_split_and_combine_fragments_with_merge_dim[other_chunks1-2-nt_dayparam1]
PASSED test_rechunking.py::test_split_and_combine_fragments_with_merge_dim[other_chunks1-5-nt_dayparam0]
PASSED test_rechunking.py::test_split_and_combine_fragments_with_merge_dim[other_chunks1-5-nt_dayparam1]
PASSED test_rechunking.py::test_split_and_combine_fragments_with_merge_dim[other_chunks2-1-nt_dayparam0]
PASSED test_rechunking.py::test_split_and_combine_fragments_with_merge_dim[other_chunks2-1-nt_dayparam1]
PASSED test_rechunking.py::test_split_and_combine_fragments_with_merge_dim[other_chunks2-2-nt_dayparam1]
FAILED test_rechunking.py::test_split_and_combine_fragments_with_merge_dim[other_chunks2-2-nt_dayparam0] - ValueError: Cannot combine fragments because they do not form a regular hypercube.
FAILED test_rechunking.py::test_split_and_combine_fragments_with_merge_dim[other_chunks2-5-nt_dayparam0] - ValueError: Cannot combine fragments because they do not form a regular hypercube.
FAILED test_rechunking.py::test_split_and_combine_fragments_with_merge_dim[other_chunks2-5-nt_dayparam1] - ValueError: Cannot combine fragments because they do not form a regular hypercube.
========================================================================== 3 failed, 15 passed, 37 deselected, 2 warnings in 5.40s ===========================================================================

@cisaacstern
Copy link
Member

Thanks so much for opening this @tom-the-hill, this is a great start towards closing #520.

I tried testing these cases with the failing portion in combine_fragments commented out, ... and all of the tests passed.

This seems to suggest that (at least in certain cases) the checks made by _invert_meshgrid are not needed.

Do you feel like you have a sense of what _invert_meshgrid is checking for? I'm asking because:

  • I don't understand it! 😅
  • Maybe finding a way to express what this check is doing in plain language can help us understand if/why it may be unneeded in this case.

@ta-hill
Copy link
Contributor Author

ta-hill commented Dec 1, 2023

I can't say I fully grasp what _invert_meshgrid is checking for, but based on the statement fragments do not form a regular hypercube and the test cases where it does actaully pass the check I think I have a general sense. The tests seem to pass for any cases where a maximum of 2 dimensions are being chunked. It only passes for 3 and 4 dimensions when the concatenating dimension ( usually time ) of all fragments in the group are evenly divided by the target chunk size of time. It doesn't seem that the chunk sizes of the other dimensions have any effect, only that they are there and not equal to 1.

My best guess for what this portion of _invert_meshgrid is actually checking is possibly related to https://math.stackexchange.com/questions/3755509/inverse-of-block-anti-diagonal-matrix.
and checking for symmetry across rotation/reflection. Maybe someone a little more mathematically inclined can give more insight.

def _invert_meshgrid(*arrays):
    """Inverts the numpy.meshgrid function."""

    ndim = len(arrays)
    shape = arrays[0].shape
    assert all(a.shape == shape for a in arrays)
    selectors = [ndim * [0] for n in range(ndim)]
    for n in range(ndim):
        selectors[n][ndim - n - 1] = slice(None)
        selectors[n] = tuple(selectors[n])
    xi = [a[s] for a, s in zip(arrays, selectors)]
    assert all(
        np.equal(actual, expected).all() for actual, expected in zip(arrays, np.meshgrid(*xi))
    )
    return xi

With just brute force testing heres a few examples of the parameters where it passes and fails:
Pass for 2D Chunking:

Columns: 
(full size of time, [time size of each split input]), {chunking dimensions} , {size of other non-time dimensions of input dataset}
[((4, [1]), {'time': 3, 'lat': 2}, {'ny': 4, 'nx': 4, 'nz': 3}),
 ((4, [1]), {'time': 3, 'lat': 2}, {'ny': 5, 'nx': 5, 'nz': 3}),
 ((4, [1]), {'time': 2, 'lat': 2}, {'ny': 4, 'nx': 4, 'nz': 3}),
 ((4, [1]), {'time': 2, 'lat': 2}, {'ny': 5, 'nx': 5, 'nz': 3}),
 ((4, [2]), {'time': 3, 'lat': 2}, {'ny': 4, 'nx': 4, 'nz': 3}),
 ((4, [2]), {'time': 3, 'lat': 2}, {'ny': 5, 'nx': 5, 'nz': 3}),
 ((4, [2]), {'time': 2, 'lat': 2}, {'ny': 4, 'nx': 4, 'nz': 3}),
 ((4, [2]), {'time': 2, 'lat': 2}, {'ny': 5, 'nx': 5, 'nz': 3}),
 ((4, [1, 2, 1]), {'time': 3, 'lat': 2}, {'ny': 4, 'nx': 4, 'nz': 3}),
 ((4, [1, 2, 1]), {'time': 3, 'lat': 2}, {'ny': 5, 'nx': 5, 'nz': 3}),
 ((4, [1, 2, 1]), {'time': 2, 'lat': 2}, {'ny': 4, 'nx': 4, 'nz': 3}),
 ((4, [1, 2, 1]), {'time': 2, 'lat': 2}, {'ny': 5, 'nx': 5, 'nz': 3}),
 ((4, [2, 1, 1]), {'time': 3, 'lat': 2}, {'ny': 4, 'nx': 4, 'nz': 3}),
 ((4, [2, 1, 1]), {'time': 3, 'lat': 2}, {'ny': 5, 'nx': 5, 'nz': 3}),
 ((4, [2, 1, 1]), {'time': 2, 'lat': 2}, {'ny': 4, 'nx': 4, 'nz': 3}),
 ((4, [2, 1, 1]), {'time': 2, 'lat': 2}, {'ny': 5, 'nx': 5, 'nz': 3}),
 ((5, [2, 1, 2]), {'time': 3, 'lat': 2}, {'ny': 4, 'nx': 4, 'nz': 3}),
 ((5, [2, 1, 2]), {'time': 3, 'lat': 2}, {'ny': 5, 'nx': 5, 'nz': 3}),
 ((5, [2, 1, 2]), {'time': 2, 'lat': 2}, {'ny': 4, 'nx': 4, 'nz': 3}),
 ((5, [2, 1, 2]), {'time': 2, 'lat': 2}, {'ny': 5, 'nx': 5, 'nz': 3}),
 ((5, [2, 2, 1]), {'time': 3, 'lat': 2}, {'ny': 4, 'nx': 4, 'nz': 3}),
 ((5, [2, 2, 1]), {'time': 3, 'lat': 2}, {'ny': 5, 'nx': 5, 'nz': 3}),
 ((5, [2, 2, 1]), {'time': 2, 'lat': 2}, {'ny': 4, 'nx': 4, 'nz': 3}),
 ((5, [2, 2, 1]), {'time': 2, 'lat': 2}, {'ny': 5, 'nx': 5, 'nz': 3})]

Pass for <=3D chunking:

Columns:
(full size of time, [time size of each split input]), {chunking dimensions} , {size of other non-time dimensions of input dataset}
[((4, [2]), {'time': 2, 'lat': 2, 'lon': 2}, {'ny': 4, 'nx': 4, 'nz': 3}),
 ((4, [2]), {'time': 2, 'lat': 2, 'lon': 2}, {'ny': 5, 'nx': 5, 'nz': 3}),
 ((4, [2]), {'time': 2, 'lat': 2, 'lon': 2, 'height': 1}, {'ny': 4, 'nx': 4, 'nz': 3}),
 ((4, [2]), {'time': 2, 'lat': 2, 'lon': 2, 'height': 1}, {'ny': 5, 'nx': 5, 'nz': 3}),
 ((5, [2, 2, 1]), {'time': 2, 'lat': 2, 'lon': 2}, {'ny': 4, 'nx': 4, 'nz': 3}),
 ((5, [2, 2, 1]), {'time': 2, 'lat': 2, 'lon': 2}, {'ny': 5, 'nx': 5, 'nz': 3}),
 ((5, [2, 2, 1]), {'time': 2, 'lat': 2, 'lon': 2, 'height': 1},{'ny': 4, 'nx': 4, 'nz': 3}),
 ((5, [2, 2, 1]), {'time': 2, 'lat': 2, 'lon': 2, 'height': 1},{'ny': 5, 'nx': 5, 'nz': 3})]

@cisaacstern
Copy link
Member

Thanks so much @tom-the-hill. I talked through this bug with @rabernat today. I've got a bit more context now, so I'll have a go at pushing some contributions to your PR here, if that's alright.

@ta-hill
Copy link
Contributor Author

ta-hill commented Dec 1, 2023

Sounds good to me, thanks for the help on this.

@cisaacstern cisaacstern changed the title Test for multidim rechunking Fix multidim rechunking _invert_meshgrid error Feb 10, 2024
jbusecke added a commit to leap-stc/cmip6-leap-feedstock that referenced this pull request Feb 10, 2024
@cisaacstern and I just came up with a fix for pangeo-forge/pangeo-forge-recipes#658 and I want to see if this alleviates some of the failures I have seen.
@cisaacstern
Copy link
Member

cisaacstern commented Feb 10, 2024

@jbusecke and I took a very deep dive on this today! (With thanks to @norlandrhagen for participating in the preamble earlier this week.) We have tentatively concluded that adding .squeeze() here is "all" that is needed to resolve this issue.

Julius is going to run this branch against some of his production cases that were triggering this error as a further gut check.

The proliferation of comments committed here so far reflect our notes as we were going along, and trying to understand how these functions actually work. We've left these in here for now, because as part of this fix, we would also like to do some renaming/refactoring as part of this work to make this area of the code more easily understandable and maintainable.

Remaining TODO:

  • Continue refactoring and unit testing of combine_fragments sub-routines (including _invert_meshgrid itself!). The funny thing is, that despite having found the (probable) fix for this through some informed trial-and-error, Julius and I don't actually fully understand yet how this section of the code is meant to behave. Cleaner factoring, naming, and unit testing, should help make this more readily apparent to all (including ourselves). This can extend to more informative/descriptive typing.
  • Remove extraneous comments once readability is enhanced by refactor/ unit testing/ better docstrings/names/types.

jbusecke added a commit to leap-stc/cmip6-leap-feedstock that referenced this pull request Feb 10, 2024
* Test fix for multidim rechunking _invert_meshgrid error

@cisaacstern and I just came up with a fix for pangeo-forge/pangeo-forge-recipes#658 and I want to see if this alleviates some of the failures I have seen.

* Update iids_pr.yaml

* Try out MPI models that failed before.

* Update iids_pr.yaml

* Update iids.yaml

* Update recipe.py
@cisaacstern
Copy link
Member

Superseded by #708. Thanks all for your help on this!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants