Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

BUG: Resample on PeriodIndex not working? #1270

Closed
max-sixty opened this issue Feb 15, 2017 · 10 comments
Closed

BUG: Resample on PeriodIndex not working? #1270

max-sixty opened this issue Feb 15, 2017 · 10 comments
Labels

Comments

@max-sixty
Copy link
Collaborator

import xarray as xr
import pandas as pd
da = xr.DataArray(pd.Series(1, pd.period_range('2000-1', '2000-12', freq='W')).rename_axis('date'))

da.resample('B', 'date', 'ffill')

---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
<ipython-input-1-eb64a66a8d1f> in <module>()
      3 da = xr.DataArray(pd.Series(1, pd.period_range('2000-1', '2000-12', freq='W')).rename_axis('date'))
      4
----> 5 da.resample('B', 'date', 'ffill')

/Users/maximilian/drive/workspace/xarray/xarray/core/common.py in resample(self, freq, dim, how, skipna, closed, label, base, keep_attrs)
    577         time_grouper = pd.TimeGrouper(freq=freq, how=how, closed=closed,
    578                                       label=label, base=base)
--> 579         gb = self.groupby_cls(self, group, grouper=time_grouper)
    580         if isinstance(how, basestring):
    581             f = getattr(gb, how)

/Users/maximilian/drive/workspace/xarray/xarray/core/groupby.py in __init__(self, obj, group, squeeze, grouper, bins, cut_kwargs)
    242                 raise ValueError('index must be monotonic for resampling')
    243             s = pd.Series(np.arange(index.size), index)
--> 244             first_items = s.groupby(grouper).first()
    245             if first_items.isnull().any():
    246                 full_index = first_items.index

/Library/Frameworks/Python.framework/Versions/3.5/lib/python3.5/site-packages/pandas/core/generic.py in groupby(self, by, axis, level, as_index, sort, group_keys, squeeze, **kwargs)
   3989         return groupby(self, by=by, axis=axis, level=level, as_index=as_index,
   3990                        sort=sort, group_keys=group_keys, squeeze=squeeze,
-> 3991                        **kwargs)
   3992
   3993     def asfreq(self, freq, method=None, how=None, normalize=False):

/Library/Frameworks/Python.framework/Versions/3.5/lib/python3.5/site-packages/pandas/core/groupby.py in groupby(obj, by, **kwds)
   1509         raise TypeError('invalid type: %s' % type(obj))
   1510
-> 1511     return klass(obj, by, **kwds)
   1512
   1513

/Library/Frameworks/Python.framework/Versions/3.5/lib/python3.5/site-packages/pandas/core/groupby.py in __init__(self, obj, keys, axis, level, grouper, exclusions, selection, as_index, sort, group_keys, squeeze, **kwargs)
    368                                                     level=level,
    369                                                     sort=sort,
--> 370                                                     mutated=self.mutated)
    371
    372         self.obj = obj

/Library/Frameworks/Python.framework/Versions/3.5/lib/python3.5/site-packages/pandas/core/groupby.py in _get_grouper(obj, key, axis, level, sort, mutated)
   2390     # a passed-in Grouper, directly convert
   2391     if isinstance(key, Grouper):
-> 2392         binner, grouper, obj = key._get_grouper(obj)
   2393         if key.key is None:
   2394             return grouper, [], obj

/Library/Frameworks/Python.framework/Versions/3.5/lib/python3.5/site-packages/pandas/tseries/resample.py in _get_grouper(self, obj)
   1059     def _get_grouper(self, obj):
   1060         # create the resampler and return our binner
-> 1061         r = self._get_resampler(obj)
   1062         r._set_binner()
   1063         return r.binner, r.grouper, r.obj

/Library/Frameworks/Python.framework/Versions/3.5/lib/python3.5/site-packages/pandas/tseries/resample.py in _get_resampler(self, obj, kind)
   1055         raise TypeError("Only valid with DatetimeIndex, "
   1056                         "TimedeltaIndex or PeriodIndex, "
-> 1057                         "but got an instance of %r" % type(ax).__name__)
   1058
   1059     def _get_grouper(self, obj):

TypeError: Only valid with DatetimeIndex, TimedeltaIndex or PeriodIndex, but got an instance of 'Index'
@fmaussion
Copy link
Member

I thought this was one of the motivations behind NetCDFTimeIndex : #1252

@shoyer
Copy link
Member

shoyer commented Feb 15, 2017

I see no reason why this shouldn't work, so my guess is that this could be fixed pretty easily. We certainly don't have any tests for this right now, though.

My first step would be to drop into a debugger and figure out why the PeriodIndex is getting converted into as base Index when put into a Series in xarray's GroupBy.__init__.

@max-sixty
Copy link
Collaborator Author

I thought this was one of the motivations behind NetCDFTimeIndex : #1252

Why?

@spencerkclark
Copy link
Member

spencerkclark commented Feb 15, 2017

@fmaussion just to clarify, #1252 is meant as an analogue to pandas' DatetimeIndex for non-standard calendars, and does not address resample (it would be nice to have at some point though). It is not intended to be used in place of (or provide similar functionality to) a PeriodIndex.

@MaximilianR perhaps it's also worth noting (as I understand it) xarray does not yet support upsampling with filling (see #563, and the docs). That being said, independent of that, there's definitely something odd going on, since attempting to downsample via a mean produces the same error:

In [21]: da = xr.DataArray(pd.Series(1, pd.period_range('2000-1', '2000-12', freq='W')).rename_axis('date'))

In [22]: da.resample('M', 'date', how='mean')
---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
<ipython-input-22-9c464b8e736c> in <module>()
----> 1 da.resample('M', 'date', how='mean')

/Users/spencerclark/xarray-dev/xarray/xarray/core/common.pyc in resample(self, freq, dim, how, skipna, closed, label, base, keep_attrs)
    577         time_grouper = pd.TimeGrouper(freq=freq, how=how, closed=closed,
    578                                       label=label, base=base)
--> 579         gb = self.groupby_cls(self, group, grouper=time_grouper)
    580         if isinstance(how, basestring):
    581             f = getattr(gb, how)

/Users/spencerclark/xarray-dev/xarray/xarray/core/groupby.pyc in __init__(self, obj, group, squeeze, grouper, bins, cut_kwargs)
    242                 raise ValueError('index must be monotonic for resampling')
    243             s = pd.Series(np.arange(index.size), index)
--> 244             first_items = s.groupby(grouper).first()
    245             if first_items.isnull().any():
    246                 full_index = first_items.index

//anaconda/envs/xarray-dev/lib/python2.7/site-packages/pandas/core/generic.pyc in groupby(self, by, axis, level, as_index, sort, group_keys, squeeze, **kwargs)
   3989         return groupby(self, by=by, axis=axis, level=level, as_index=as_index,
   3990                        sort=sort, group_keys=group_keys, squeeze=squeeze,
-> 3991                        **kwargs)
   3992
   3993     def asfreq(self, freq, method=None, how=None, normalize=False):

//anaconda/envs/xarray-dev/lib/python2.7/site-packages/pandas/core/groupby.pyc in groupby(obj, by, **kwds)
   1509         raise TypeError('invalid type: %s' % type(obj))
   1510
-> 1511     return klass(obj, by, **kwds)
   1512
   1513

//anaconda/envs/xarray-dev/lib/python2.7/site-packages/pandas/core/groupby.pyc in __init__(self, obj, keys, axis, level, grouper, exclusions, selection, as_index, sort, group_keys, squeeze, **kwargs)
    368                                                     level=level,
    369                                                     sort=sort,
--> 370                                                     mutated=self.mutated)
    371
    372         self.obj = obj

//anaconda/envs/xarray-dev/lib/python2.7/site-packages/pandas/core/groupby.pyc in _get_grouper(obj, key, axis, level, sort, mutated)
   2390     # a passed-in Grouper, directly convert
   2391     if isinstance(key, Grouper):
-> 2392         binner, grouper, obj = key._get_grouper(obj)
   2393         if key.key is None:
   2394             return grouper, [], obj

//anaconda/envs/xarray-dev/lib/python2.7/site-packages/pandas/tseries/resample.pyc in _get_grouper(self, obj)
   1059     def _get_grouper(self, obj):
   1060         # create the resampler and return our binner
-> 1061         r = self._get_resampler(obj)
   1062         r._set_binner()
   1063         return r.binner, r.grouper, r.obj

//anaconda/envs/xarray-dev/lib/python2.7/site-packages/pandas/tseries/resample.pyc in _get_resampler(self, obj, kind)
   1055         raise TypeError("Only valid with DatetimeIndex, "
   1056                         "TimedeltaIndex or PeriodIndex, "
-> 1057                         "but got an instance of %r" % type(ax).__name__)
   1058
   1059     def _get_grouper(self, obj):

TypeError: Only valid with DatetimeIndex, TimedeltaIndex or PeriodIndex, but got an instance of 'Index'

@fmaussion
Copy link
Member

Maybe related? Selection with slices also doesn't work:

da = xr.DataArray(pd.Series(1, pd.period_range('1990-1', '2000-12', freq='M')))
da.sel(dim_0='1991-07') # works fine
da.sel(dim_0='1992-02') # works fine
da.sel(dim_0=slice('1991-07', '1992-02'))

Throws:

---------------------------------------------------------------------------
AssertionError                            Traceback (most recent call last)
<ipython-input-35-2e962ee8a791> in <module>()
      2 da.sel(dim_0='1991-07') # works fine
      3 da.sel(dim_0='1992-02') # works fine
----> 4 da.sel(dim_0=slice('1991-07', '1992-02'))

/home/mowglie/.pyvirtualenvs/py3/lib/python3.5/site-packages/xarray/core/dataarray.py in sel(self, method, tolerance, drop, **indexers)
    668         """
    669         pos_indexers, new_indexes = indexing.remap_label_indexers(
--> 670             self, indexers, method=method, tolerance=tolerance
    671         )
    672         result = self.isel(drop=drop, **pos_indexers)

/home/mowglie/.pyvirtualenvs/py3/lib/python3.5/site-packages/xarray/core/indexing.py in remap_label_indexers(data_obj, indexers, method, tolerance)
    286         else:
    287             idxr, new_idx = convert_label_indexer(index, label,
--> 288                                                   dim, method, tolerance)
    289             pos_indexers[dim] = idxr
    290             if new_idx is not None:

/home/mowglie/.pyvirtualenvs/py3/lib/python3.5/site-packages/xarray/core/indexing.py in convert_label_indexer(index, label, index_name, method, tolerance)
    183         indexer = index.slice_indexer(_try_get_item(label.start),
    184                                       _try_get_item(label.stop),
--> 185                                       _try_get_item(label.step))
    186         if not isinstance(indexer, slice):
    187             # unlike pandas, in xarray we never want to silently convert a slice

/home/mowglie/.pyvirtualenvs/py3/lib/python3.5/site-packages/pandas/indexes/base.py in slice_indexer(self, start, end, step, kind)
   2995         """
   2996         start_slice, end_slice = self.slice_locs(start, end, step=step,
-> 2997                                                  kind=kind)
   2998 
   2999         # return a slice

/home/mowglie/.pyvirtualenvs/py3/lib/python3.5/site-packages/pandas/indexes/base.py in slice_locs(self, start, end, step, kind)
   3174         start_slice = None
   3175         if start is not None:
-> 3176             start_slice = self.get_slice_bound(start, 'left', kind)
   3177         if start_slice is None:
   3178             start_slice = 0

/home/mowglie/.pyvirtualenvs/py3/lib/python3.5/site-packages/pandas/indexes/base.py in get_slice_bound(self, label, side, kind)
   3113         # For datetime indices label may be a string that has to be converted
   3114         # to datetime boundary according to its resolution.
-> 3115         label = self._maybe_cast_slice_bound(label, side, kind)
   3116 
   3117         # we need to look up the label

/home/mowglie/.pyvirtualenvs/py3/lib/python3.5/site-packages/pandas/tseries/period.py in _maybe_cast_slice_bound(self, label, side, kind)
    838 
    839         """
--> 840         assert kind in ['ix', 'loc', 'getitem']
    841 
    842         if isinstance(label, datetime):

AssertionError: 

@lvankampenhout
Copy link

+1 to this issue. I'm struggling big time with an 1800-year climate model dataset that I need to resample in order to make different annual means (June-May).

@spencerkclark
Copy link
Member

+1 to this issue. I'm struggling big time with an 1800-year climate model dataset that I need to resample in order to make different annual means (June-May).

@lvankampenhout I agree that it would be nice if xarray had better support for PeriodIndexes.

Do you happen to be using a PeriodIndex because of pandas Timestamp-limitations? Despite the fact that generalized resample has not been implemented yet, I recommend you try using the new CFTimeIndex. As it turns out, for some one-off cases (like this one) resample is not too difficult to mimic using groupby. See the following example for your case. I'm assuming you're looking for resampling with the 'AS-JUN' anchored offset?

from itertools import product
from cftime import DatetimeProlepticGregorian as datetime
import numpy as np
import xarray as xr

xr.set_options(enable_cftimeindex=True)

# Set up some example data indexed by cftime.DatetimeProlepticGregorian objects
dates = [datetime(year, month, 1) for year, month in  product(range(2, 5), range(1, 13))]
da = xr.DataArray(np.arange(len(dates)), coords=[dates], dims=['time'])
    
# Mimic resampling with the AS-JUN anchored offset
years = da.time.dt.year - (da.time.dt.month < 6)
da['AS-JUN'] = xr.DataArray([datetime(year, 6, 1) for year in years], coords=da.time.coords)
resampled = da.groupby('AS-JUN').mean('time').rename({'AS-JUN': 'time'})

This gives the following for resampled:

<xarray.DataArray (time: 4)>
array([  2. ,  10.5,  22.5,  32. ])
Coordinates:
  * time     (time) object 0001-06-01 00:00:00 0002-06-01 00:00:00 ...

This is analogous to using resample(time='AS-JUN') with a DataArray indexed by a DatetimeIndex:

import pandas as pd
dates = pd.date_range('2002-01-01', freq='M', periods=36)
da = xr.DataArray(np.arange(len(dates)), coords=[dates], dims='time')
resampled = da.resample(time='AS-JUN').mean('time')

which gives:

<xarray.DataArray (time: 4)>
array([  2. ,  10.5,  22.5,  32. ])
Coordinates:
  * time     (time) datetime64[ns] 2001-06-01 2002-06-01 2003-06-01 2004-06-01

@lvankampenhout
Copy link

thanks for your elaborate response @spencerkclark

Do you happen to be using a PeriodIndex because of pandas Timestamp-limitations?

Yes, the main limitation being the limited range of years (~584) whereas my dataset spans 1800 years. Note that in glaciology, which deals with ice sheet responses over multiple millennia, this is considered a short period.

I elaborated a bit more on my problem in this issue which is in a unofficial repo, I realized too late.

Anyway, your code using cftime solves my problem 😄 indeed resampling to 'AS-JUN' is what I was looking for. Still, it would be nice to have better support for PeriodIndex in the future. It has costed me a lot of time figuring out what's going on and learning the details of all the different date & time implementations. Which is a waste in the end.

@spencerahill
Copy link
Contributor

@lvankampenhout just FYI #2191 has been opened for further discussion of adding resample to CFTimeIndex. So keep an eye on that for those developments...as well as consider taking a stab at implementing it yourself! I'm sure @spencerkclark and others will be keen to help out once you (or somebody) gets started.

@stale
Copy link

stale bot commented Apr 28, 2020

In order to maintain a list of currently relevant issues, we mark issues as stale after a period of inactivity

If this issue remains relevant, please comment here or remove the stale label; otherwise it will be marked as closed automatically

@stale stale bot added the stale label Apr 28, 2020
@stale stale bot closed this as completed May 30, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

6 participants