Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

plot arrays across faces without cutout #427

Merged
merged 11 commits into from
Jun 13, 2024
Merged

Conversation

Mikejmnez
Copy link
Collaborator

@Mikejmnez Mikejmnez commented Apr 1, 2024

This issue closes #410 and #409

The following are tasks that need to be completed before merging this PR. Until then, consider this as a work in progress

  • Add plot function to code base that works.
  • Include some tests.
  • Update API / documentation that clearly demonstrate how it works.
  • Include a link to an example notebook

I will work on completing the rest of the tasks sometime this week, or the weekend at the latest.

Copy link

codecov bot commented Apr 1, 2024

Codecov Report

Attention: Patch coverage is 68.36158% with 56 lines in your changes are missing coverage. Please review.

Project coverage is 95.62%. Comparing base (9e566fc) to head (ed9f249).
Report is 5 commits behind head on main.

Files Patch % Lines
oceanspy/plot.py 68.36% 36 Missing and 20 partials ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##             main     #427      +/-   ##
==========================================
- Coverage   96.59%   95.62%   -0.97%     
==========================================
  Files           9        9              
  Lines        4987     5163     +176     
  Branches     1190     1246      +56     
==========================================
+ Hits         4817     4937     +120     
- Misses         97      133      +36     
- Partials       73       93      +20     
Flag Coverage Δ
unittests 95.62% <68.36%> (-0.97%) ⬇️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

Copy link
Collaborator

@ThomasHaine ThomasHaine left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm trying to run your example notebook. I need catalog_ECCO.yaml. Please can you provide it? Thanks!

@Mikejmnez
Copy link
Collaborator Author

Hey Tom. catalog_ECCO.yml it is just a catalog on my local computer that I use for development since it points towards the test data from oceanspy. I think it might be better to run the notebook on sciserver. I will make the proper modifications to do that (3 cells)

@Mikejmnez
Copy link
Collaborator Author

@ThomasHaine I changed a bit the notebook in the hyper link above to run in Sciserver, it also has description on how to install the correct oceanspy version (this branch) in a sciserver environment, and also added improved description of the notebook. There are about 20 examples.

The notebook successfully ran in Sciserver with ECCO. I'd expect the same with LLC4320.

@ThomasHaine
Copy link
Collaborator

@Mikejmnez thanks. I'll check...

@ThomasHaine
Copy link
Collaborator

With LLC4320 it errors at the first plot with:

---------------------------------------------------------------------------
KeyError                                  Traceback (most recent call last)
File <timed exec>:8

File ~/mambaforge/envs/Oceanography/lib/python3.10/site-packages/oceanspy/plot.py:1474, in _plotMethods.faces_array(self, **kwargs)
   1472 @_functools.wraps(faces_array)
   1473 def faces_array(self, **kwargs):
-> 1474     return faces_array(self._od, **kwargs)

File ~/mambaforge/envs/Oceanography/lib/python3.10/site-packages/oceanspy/plot.py:1127, in faces_array(od, Ymoor, Xmoor, varName, xoak_index, face2axis, **kwargs)
   1121 if xoak_index not in _xoak.IndexRegistry():
   1122     raise ValueError(
   1123         "`xoak_index` [{}] is not supported."
   1124         "\nAvailable options: {}"
   1125         "".format(xoak_index, _xoak.IndexRegistry())
   1126     )
-> 1127 ds_grid.xoak.set_index(["XC", "YC"], xoak_index)
   1128 cdata = {"XC": ("mooring", Xmoor), "YC": ("mooring", Ymoor)}
   1129 ds_data = _xr.Dataset(cdata)  # mooring data

File ~/mambaforge/envs/Oceanography/lib/python3.10/site-packages/xoak/accessor.py:101, in XoakAccessor.set_index(self, coords, index_type, persist, **kwargs)
     98 self._index_type = index_type
     99 self._index_coords = tuple(coords)
--> 101 coord_objs = [self._xarray_obj.coords[cn] for cn in coords]
    103 if len(set([c.dims for c in coord_objs])) > 1:
    104     raise ValueError(
    105         'Coordinates {coords} must all have the same dimensions in the same order'
    106     )

File ~/mambaforge/envs/Oceanography/lib/python3.10/site-packages/xoak/accessor.py:101, in <listcomp>(.0)
     98 self._index_type = index_type
     99 self._index_coords = tuple(coords)
--> 101 coord_objs = [self._xarray_obj.coords[cn] for cn in coords]
    103 if len(set([c.dims for c in coord_objs])) > 1:
    104     raise ValueError(
    105         'Coordinates {coords} must all have the same dimensions in the same order'
    106     )

File ~/mambaforge/envs/Oceanography/lib/python3.10/site-packages/xarray/core/coordinates.py:289, in DatasetCoordinates.__getitem__(self, key)
    287 def __getitem__(self, key: Hashable) -> DataArray:
    288     if key in self._data.data_vars:
--> 289         raise KeyError(key)
    290     return self._data[key]

KeyError: 'XC'

@ThomasHaine
Copy link
Collaborator

@Mikejmnez any ideas about how to fix?

With LLC4320 it errors at the first plot with:

---------------------------------------------------------------------------
KeyError                                  Traceback (most recent call last)
File <timed exec>:8

File ~/mambaforge/envs/Oceanography/lib/python3.10/site-packages/oceanspy/plot.py:1474, in _plotMethods.faces_array(self, **kwargs)
   1472 @_functools.wraps(faces_array)
   1473 def faces_array(self, **kwargs):
-> 1474     return faces_array(self._od, **kwargs)

File ~/mambaforge/envs/Oceanography/lib/python3.10/site-packages/oceanspy/plot.py:1127, in faces_array(od, Ymoor, Xmoor, varName, xoak_index, face2axis, **kwargs)
   1121 if xoak_index not in _xoak.IndexRegistry():
   1122     raise ValueError(
   1123         "`xoak_index` [{}] is not supported."
   1124         "\nAvailable options: {}"
   1125         "".format(xoak_index, _xoak.IndexRegistry())
   1126     )
-> 1127 ds_grid.xoak.set_index(["XC", "YC"], xoak_index)
   1128 cdata = {"XC": ("mooring", Xmoor), "YC": ("mooring", Ymoor)}
   1129 ds_data = _xr.Dataset(cdata)  # mooring data

File ~/mambaforge/envs/Oceanography/lib/python3.10/site-packages/xoak/accessor.py:101, in XoakAccessor.set_index(self, coords, index_type, persist, **kwargs)
     98 self._index_type = index_type
     99 self._index_coords = tuple(coords)
--> 101 coord_objs = [self._xarray_obj.coords[cn] for cn in coords]
    103 if len(set([c.dims for c in coord_objs])) > 1:
    104     raise ValueError(
    105         'Coordinates {coords} must all have the same dimensions in the same order'
    106     )

File ~/mambaforge/envs/Oceanography/lib/python3.10/site-packages/xoak/accessor.py:101, in <listcomp>(.0)
     98 self._index_type = index_type
     99 self._index_coords = tuple(coords)
--> 101 coord_objs = [self._xarray_obj.coords[cn] for cn in coords]
    103 if len(set([c.dims for c in coord_objs])) > 1:
    104     raise ValueError(
    105         'Coordinates {coords} must all have the same dimensions in the same order'
    106     )

File ~/mambaforge/envs/Oceanography/lib/python3.10/site-packages/xarray/core/coordinates.py:289, in DatasetCoordinates.__getitem__(self, key)
    287 def __getitem__(self, key: Hashable) -> DataArray:
    288     if key in self._data.data_vars:
--> 289         raise KeyError(key)
    290     return self._data[key]

KeyError: 'XC'

@Mikejmnez
Copy link
Collaborator Author

Hey @ThomasHaine! Apologies, I thought I wrote but my message stayed as draft.
The issue with LLC4320 is a non-issue really. The Keyerror means that the variable XC (and YC) are not defined as coordinates for the xarray.dataset. This is fixed by setting them as coordinates, for example:

ds = ds.set_coords(['XC', 'YC'])

that does it. Otherwise xoak can't find them.

However, last week as I ran this code with LLC4320 on SciServer (Oceanography (Integrated Viewer)) and found that:

  1. it remains pretty slow. Most of the time is on building the tree with xoak. I tried loading the variables XC, YC into memory (since these are 2D) but it didn't work. It has to do with building the tree everytime there is a query for coordinates. I need to work on perhaps reusing the build of the tree (via xoak) so that we don't continuously re-build it, which seems unnecessary.
    Perhaps once Grendel is ready things will be faster and it will be a non-issue.

  2. The plotting function has some bug or something. When I use the same coordinates from the ECCO examples but now on LLC4320, the plots are off, which is weird since both datasets have the same topology.

I need to spend some time on it. I am not too concern about 2), but 1) is a bummer (that is slow with LLC4320). Again, perhaps with Grendel things will run much much faster. Right now I am using the basic Oceanography (Integrated Viewer) volume on Sciserver (so pretty basic).

@ThomasHaine
Copy link
Collaborator

Thanks @Mikejmnez. I find the same issues (1) and (2). Grendel is still available, so it may be possible to test now. Otherwise, the data transfer to the new ceph cluster will happen soon (I hope).

Is it possible to build the xoak tree once and store it?

@Mikejmnez
Copy link
Collaborator Author

Is it possible to build the xoak tree once and store it?

Not sure, but if it is that would significantly speed up the plotting in this PR, subsample.mooring_array and subsample.stations!! I need to spend some time exploring these options, and finding the bug on the plotting func in this PR.

@Mikejmnez
Copy link
Collaborator Author

Mikejmnez commented Apr 26, 2024

2. The plotting function has some bug or something. When I use the same coordinates from the ECCO examples but now on LLC4320, the plots are off, which is weird since both datasets have the same topology.

Fixed this problem with 2) above. It was minor issue and now plotting works as expected with LLC4320, DYAMOND and all other LLC-geometries. You can find some examples with LLC4320 (and how long it took to run) on this notebook.

note

  • I had to download LLC4320 and DYAMOND data to my local computer in zarr format, to speed things up. The 2D Grid and single snapshot of ETA. And so the notebooks run locally to the data, which is as fast as it will get on a single (8core) computer.

  • I did some performance test of xoak on my local computer. Not bad: xoak builds in about 20 secs with LLC430 (4secs with DYAMOND) and so things are pretty fast when data is local to the compute. After than, selecting range of coordinates is pretty fast. I didn't test if storing the grid after xoak was built bypasses the need to build xoak within the grid dataset everytime. Will check that soon.

  • I need to do some more benchmark: preliminary data on this notebook.

  • Another thing that slows down things on LLC4320 is the amount of points to be plotted. 4320^2 per face, and when there are multiple faces involved, many more grid points. And so a lot of the time spent by the function was simply plotting (up to ~1 minute). And that is when data is local to compute notebook. On Sciserver (at least the interactive node) I would expect things to be much slower. But perhaps with Grendel (optimize transfer of data) things will be much faster.

TODO: Still need to write tests and documentation (docstring).

@Mikejmnez
Copy link
Collaborator Author

This PR is ready for merger!

@ThomasHaine
Copy link
Collaborator

Thanks @Mikejmnez . I'll take a look and review, then you can merge.

@Mikejmnez
Copy link
Collaborator Author

Thanks @Mikejmnez . I'll take a look and review, then you can merge.

Great. I updated the example notebook at the top of this PR conversation so you can see examples with LLC4320. You can see the timings there when the grid dataset is stored in zarr in my machine, next to the notebook.

The ds_grid + xoak combination is being build everytime there is a plot, and so those timings can probably be improved a bit. The notebook can be considered the baseline when data is stored locally next to compute environment.

@Mikejmnez
Copy link
Collaborator Author

@ThomasHaine BTW you should re-install oceanspy from the branch, since the code has changed since last time you installed it. The instructions to do that are in the example notebook that is referenced at the beginning of the PR conversation.

NOTE: The notebook uses a yaml file that is also available from the GH repository, but that yaml file defines the dataset locally to the notebook. To run that notebook from within Sciserver, you should instead use oceanspy's default catalog:

cell [4]:  ECCOod = ospy.open_oceandataset.from_catalog("ECCO")

and

cell[6]: LLC4320od = ospy.open_oceandataset.from_catalog("LLC4320")

@ThomasHaine
Copy link
Collaborator

@Mikejmnez I checked the notebook with ECCO and LLC4320 on SciServer, and all works fine. Please proceed with the merge!

@Mikejmnez Mikejmnez merged commit 93332dd into hainegroup:main Jun 13, 2024
5 checks passed
@Mikejmnez Mikejmnez deleted the iss410 branch June 15, 2024 19:08
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

add plot function for mooring arrays across faces
2 participants