Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Enable resampling on PeriodIndex #2687

Closed
wants to merge 2 commits into from

Conversation

max-sixty
Copy link
Collaborator

This allows resampling with PeriodIndex objects by keeping the group as an index rather than coercing to a DataArray (which coerces any non-native types to objects)

I'm still getting one failure around the name of the IndexVariable still being __resample_dim__ after resample, but wanted to socialize the approach of allowing a name argument to IndexVariable - is this reasonable?

@max-sixty max-sixty changed the title add name arg to IndexVariable, adjust resample logic Enable resampling on PeriodIndex Jan 17, 2019
@max-sixty
Copy link
Collaborator Author

Let me know any thoughts / alternative approaches here.

To emphasize - the issue with coercing the labels to a DataArray is that non-native dtypes, such as PeriodIndex get converted to objects, and pandas' resampling machinery (which xarray uses) requires the standard indexes.

group = DataArray(dim_coord, coords=dim_coord.coords,
dims=dim_coord.dims, name=RESAMPLE_DIM)
group = IndexVariable(
data=self.indexes[dim], dims=[dim], name=RESAMPLE_DIM)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we could alternatively fix this by replace dim_coord in the data argument with dim_coord.to_index(), e.g., compare:

ipdb> DataArray(dim_coord, coords=dim_coord.coords, dims=dim_coord.dims, name=RESAMPLE_DIM).to_index()
Index([2000-01-01 00:00, 2000-01-01 06:00, 2000-01-01 12:00, 2000-01-01 18:00,
       2000-01-02 00:00, 2000-01-02 06:00, 2000-01-02 12:00, 2000-01-02 18:00,
       2000-01-03 00:00, 2000-01-03 06:00],
      dtype='object', name='time')
ipdb> DataArray(dim_coord.to_index(), coords=dim_coord.coords, dims=dim_coord.dims, name=RESAMPLE_DIM).to_index()
PeriodIndex(['2000-01-01 00:00', '2000-01-01 06:00', '2000-01-01 12:00',
             '2000-01-01 18:00', '2000-01-02 00:00', '2000-01-02 06:00',
             '2000-01-02 12:00', '2000-01-02 18:00', '2000-01-03 00:00',
             '2000-01-03 06:00'],
            dtype='period[6H]', name='time', freq='6H')

We go to some effort to preserve pandas.Index classes when passed into DataArray/Variable but apparently don't do it consistently everywhere. So there's also probably a lower level fix somewhere (perhaps in xarray.core.variable.as_compatible_data?) to ensure that this happens more consistently.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great! I'll have a look

@@ -1789,12 +1789,17 @@ class IndexVariable(Variable):
unless another name is given.
"""

def __init__(self, dims, data, attrs=None, encoding=None, fastpath=False):
def __init__(self, dims, data, attrs=None, encoding=None, fastpath=False, name=None):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would rather not add an explicit name parameter to IndexVariable.

If anything, I would think about removing the name property. Long term, I'd like to rename IndexVariable -> ImmutableVariable and move the pandas adapter logic into class that we can pass into the data argument of Variable.

@max-sixty
Copy link
Collaborator Author

This is so old that probably it wouldn't even be a good place to start from. Sorry I never pushed it through. I'll close.

@max-sixty max-sixty closed this Nov 17, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants