BUG: Resample on PeriodIndex not working? #1270

max-sixty · 2017-02-15T16:56:21Z

import xarray as xr
import pandas as pd
da = xr.DataArray(pd.Series(1, pd.period_range('2000-1', '2000-12', freq='W')).rename_axis('date'))

da.resample('B', 'date', 'ffill')

---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
<ipython-input-1-eb64a66a8d1f> in <module>()
      3 da = xr.DataArray(pd.Series(1, pd.period_range('2000-1', '2000-12', freq='W')).rename_axis('date'))
      4
----> 5 da.resample('B', 'date', 'ffill')

/Users/maximilian/drive/workspace/xarray/xarray/core/common.py in resample(self, freq, dim, how, skipna, closed, label, base, keep_attrs)
    577         time_grouper = pd.TimeGrouper(freq=freq, how=how, closed=closed,
    578                                       label=label, base=base)
--> 579         gb = self.groupby_cls(self, group, grouper=time_grouper)
    580         if isinstance(how, basestring):
    581             f = getattr(gb, how)

/Users/maximilian/drive/workspace/xarray/xarray/core/groupby.py in __init__(self, obj, group, squeeze, grouper, bins, cut_kwargs)
    242                 raise ValueError('index must be monotonic for resampling')
    243             s = pd.Series(np.arange(index.size), index)
--> 244             first_items = s.groupby(grouper).first()
    245             if first_items.isnull().any():
    246                 full_index = first_items.index

/Library/Frameworks/Python.framework/Versions/3.5/lib/python3.5/site-packages/pandas/core/generic.py in groupby(self, by, axis, level, as_index, sort, group_keys, squeeze, **kwargs)
   3989         return groupby(self, by=by, axis=axis, level=level, as_index=as_index,
   3990                        sort=sort, group_keys=group_keys, squeeze=squeeze,
-> 3991                        **kwargs)
   3992
   3993     def asfreq(self, freq, method=None, how=None, normalize=False):

/Library/Frameworks/Python.framework/Versions/3.5/lib/python3.5/site-packages/pandas/core/groupby.py in groupby(obj, by, **kwds)
   1509         raise TypeError('invalid type: %s' % type(obj))
   1510
-> 1511     return klass(obj, by, **kwds)
   1512
   1513

/Library/Frameworks/Python.framework/Versions/3.5/lib/python3.5/site-packages/pandas/core/groupby.py in __init__(self, obj, keys, axis, level, grouper, exclusions, selection, as_index, sort, group_keys, squeeze, **kwargs)
    368                                                     level=level,
    369                                                     sort=sort,
--> 370                                                     mutated=self.mutated)
    371
    372         self.obj = obj

/Library/Frameworks/Python.framework/Versions/3.5/lib/python3.5/site-packages/pandas/core/groupby.py in _get_grouper(obj, key, axis, level, sort, mutated)
   2390     # a passed-in Grouper, directly convert
   2391     if isinstance(key, Grouper):
-> 2392         binner, grouper, obj = key._get_grouper(obj)
   2393         if key.key is None:
   2394             return grouper, [], obj

/Library/Frameworks/Python.framework/Versions/3.5/lib/python3.5/site-packages/pandas/tseries/resample.py in _get_grouper(self, obj)
   1059     def _get_grouper(self, obj):
   1060         # create the resampler and return our binner
-> 1061         r = self._get_resampler(obj)
   1062         r._set_binner()
   1063         return r.binner, r.grouper, r.obj

/Library/Frameworks/Python.framework/Versions/3.5/lib/python3.5/site-packages/pandas/tseries/resample.py in _get_resampler(self, obj, kind)
   1055         raise TypeError("Only valid with DatetimeIndex, "
   1056                         "TimedeltaIndex or PeriodIndex, "
-> 1057                         "but got an instance of %r" % type(ax).__name__)
   1058
   1059     def _get_grouper(self, obj):

TypeError: Only valid with DatetimeIndex, TimedeltaIndex or PeriodIndex, but got an instance of 'Index'

fmaussion · 2017-02-15T17:07:22Z

I thought this was one of the motivations behind NetCDFTimeIndex : #1252

shoyer · 2017-02-15T17:43:31Z

I see no reason why this shouldn't work, so my guess is that this could be fixed pretty easily. We certainly don't have any tests for this right now, though.

My first step would be to drop into a debugger and figure out why the PeriodIndex is getting converted into as base Index when put into a Series in xarray's GroupBy.__init__.

max-sixty · 2017-02-15T18:46:33Z

I thought this was one of the motivations behind NetCDFTimeIndex : #1252

Why?

spencerkclark · 2017-02-15T18:47:00Z

@fmaussion just to clarify, #1252 is meant as an analogue to pandas' DatetimeIndex for non-standard calendars, and does not address resample (it would be nice to have at some point though). It is not intended to be used in place of (or provide similar functionality to) a PeriodIndex.

@MaximilianR perhaps it's also worth noting (as I understand it) xarray does not yet support upsampling with filling (see #563, and the docs). That being said, independent of that, there's definitely something odd going on, since attempting to downsample via a mean produces the same error:

In [21]: da = xr.DataArray(pd.Series(1, pd.period_range('2000-1', '2000-12', freq='W')).rename_axis('date'))

In [22]: da.resample('M', 'date', how='mean')
---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
<ipython-input-22-9c464b8e736c> in <module>()
----> 1 da.resample('M', 'date', how='mean')

/Users/spencerclark/xarray-dev/xarray/xarray/core/common.pyc in resample(self, freq, dim, how, skipna, closed, label, base, keep_attrs)
    577         time_grouper = pd.TimeGrouper(freq=freq, how=how, closed=closed,
    578                                       label=label, base=base)
--> 579         gb = self.groupby_cls(self, group, grouper=time_grouper)
    580         if isinstance(how, basestring):
    581             f = getattr(gb, how)

/Users/spencerclark/xarray-dev/xarray/xarray/core/groupby.pyc in __init__(self, obj, group, squeeze, grouper, bins, cut_kwargs)
    242                 raise ValueError('index must be monotonic for resampling')
    243             s = pd.Series(np.arange(index.size), index)
--> 244             first_items = s.groupby(grouper).first()
    245             if first_items.isnull().any():
    246                 full_index = first_items.index

//anaconda/envs/xarray-dev/lib/python2.7/site-packages/pandas/core/generic.pyc in groupby(self, by, axis, level, as_index, sort, group_keys, squeeze, **kwargs)
   3989         return groupby(self, by=by, axis=axis, level=level, as_index=as_index,
   3990                        sort=sort, group_keys=group_keys, squeeze=squeeze,
-> 3991                        **kwargs)
   3992
   3993     def asfreq(self, freq, method=None, how=None, normalize=False):

//anaconda/envs/xarray-dev/lib/python2.7/site-packages/pandas/core/groupby.pyc in groupby(obj, by, **kwds)
   1509         raise TypeError('invalid type: %s' % type(obj))
   1510
-> 1511     return klass(obj, by, **kwds)
   1512
   1513

//anaconda/envs/xarray-dev/lib/python2.7/site-packages/pandas/core/groupby.pyc in __init__(self, obj, keys, axis, level, grouper, exclusions, selection, as_index, sort, group_keys, squeeze, **kwargs)
    368                                                     level=level,
    369                                                     sort=sort,
--> 370                                                     mutated=self.mutated)
    371
    372         self.obj = obj

//anaconda/envs/xarray-dev/lib/python2.7/site-packages/pandas/core/groupby.pyc in _get_grouper(obj, key, axis, level, sort, mutated)
   2390     # a passed-in Grouper, directly convert
   2391     if isinstance(key, Grouper):
-> 2392         binner, grouper, obj = key._get_grouper(obj)
   2393         if key.key is None:
   2394             return grouper, [], obj

//anaconda/envs/xarray-dev/lib/python2.7/site-packages/pandas/tseries/resample.pyc in _get_grouper(self, obj)
   1059     def _get_grouper(self, obj):
   1060         # create the resampler and return our binner
-> 1061         r = self._get_resampler(obj)
   1062         r._set_binner()
   1063         return r.binner, r.grouper, r.obj

//anaconda/envs/xarray-dev/lib/python2.7/site-packages/pandas/tseries/resample.pyc in _get_resampler(self, obj, kind)
   1055         raise TypeError("Only valid with DatetimeIndex, "
   1056                         "TimedeltaIndex or PeriodIndex, "
-> 1057                         "but got an instance of %r" % type(ax).__name__)
   1058
   1059     def _get_grouper(self, obj):

TypeError: Only valid with DatetimeIndex, TimedeltaIndex or PeriodIndex, but got an instance of 'Index'

fmaussion · 2017-02-17T15:29:30Z

Maybe related? Selection with slices also doesn't work:

da = xr.DataArray(pd.Series(1, pd.period_range('1990-1', '2000-12', freq='M')))
da.sel(dim_0='1991-07') # works fine
da.sel(dim_0='1992-02') # works fine
da.sel(dim_0=slice('1991-07', '1992-02'))

Throws:

---------------------------------------------------------------------------
AssertionError                            Traceback (most recent call last)
<ipython-input-35-2e962ee8a791> in <module>()
      2 da.sel(dim_0='1991-07') # works fine
      3 da.sel(dim_0='1992-02') # works fine
----> 4 da.sel(dim_0=slice('1991-07', '1992-02'))

/home/mowglie/.pyvirtualenvs/py3/lib/python3.5/site-packages/xarray/core/dataarray.py in sel(self, method, tolerance, drop, **indexers)
    668         """
    669         pos_indexers, new_indexes = indexing.remap_label_indexers(
--> 670             self, indexers, method=method, tolerance=tolerance
    671         )
    672         result = self.isel(drop=drop, **pos_indexers)

/home/mowglie/.pyvirtualenvs/py3/lib/python3.5/site-packages/xarray/core/indexing.py in remap_label_indexers(data_obj, indexers, method, tolerance)
    286         else:
    287             idxr, new_idx = convert_label_indexer(index, label,
--> 288                                                   dim, method, tolerance)
    289             pos_indexers[dim] = idxr
    290             if new_idx is not None:

/home/mowglie/.pyvirtualenvs/py3/lib/python3.5/site-packages/xarray/core/indexing.py in convert_label_indexer(index, label, index_name, method, tolerance)
    183         indexer = index.slice_indexer(_try_get_item(label.start),
    184                                       _try_get_item(label.stop),
--> 185                                       _try_get_item(label.step))
    186         if not isinstance(indexer, slice):
    187             # unlike pandas, in xarray we never want to silently convert a slice

/home/mowglie/.pyvirtualenvs/py3/lib/python3.5/site-packages/pandas/indexes/base.py in slice_indexer(self, start, end, step, kind)
   2995         """
   2996         start_slice, end_slice = self.slice_locs(start, end, step=step,
-> 2997                                                  kind=kind)
   2998 
   2999         # return a slice

/home/mowglie/.pyvirtualenvs/py3/lib/python3.5/site-packages/pandas/indexes/base.py in slice_locs(self, start, end, step, kind)
   3174         start_slice = None
   3175         if start is not None:
-> 3176             start_slice = self.get_slice_bound(start, 'left', kind)
   3177         if start_slice is None:
   3178             start_slice = 0

/home/mowglie/.pyvirtualenvs/py3/lib/python3.5/site-packages/pandas/indexes/base.py in get_slice_bound(self, label, side, kind)
   3113         # For datetime indices label may be a string that has to be converted
   3114         # to datetime boundary according to its resolution.
-> 3115         label = self._maybe_cast_slice_bound(label, side, kind)
   3116 
   3117         # we need to look up the label

/home/mowglie/.pyvirtualenvs/py3/lib/python3.5/site-packages/pandas/tseries/period.py in _maybe_cast_slice_bound(self, label, side, kind)
    838 
    839         """
--> 840         assert kind in ['ix', 'loc', 'getitem']
    841 
    842         if isinstance(label, datetime):

AssertionError:

lvankampenhout · 2018-05-22T07:36:40Z

+1 to this issue. I'm struggling big time with an 1800-year climate model dataset that I need to resample in order to make different annual means (June-May).

spencerkclark · 2018-05-22T12:36:35Z

+1 to this issue. I'm struggling big time with an 1800-year climate model dataset that I need to resample in order to make different annual means (June-May).

@lvankampenhout I agree that it would be nice if xarray had better support for PeriodIndexes.

Do you happen to be using a PeriodIndex because of pandas Timestamp-limitations? Despite the fact that generalized resample has not been implemented yet, I recommend you try using the new CFTimeIndex. As it turns out, for some one-off cases (like this one) resample is not too difficult to mimic using groupby. See the following example for your case. I'm assuming you're looking for resampling with the 'AS-JUN' anchored offset?

from itertools import product
from cftime import DatetimeProlepticGregorian as datetime
import numpy as np
import xarray as xr

xr.set_options(enable_cftimeindex=True)

# Set up some example data indexed by cftime.DatetimeProlepticGregorian objects
dates = [datetime(year, month, 1) for year, month in  product(range(2, 5), range(1, 13))]
da = xr.DataArray(np.arange(len(dates)), coords=[dates], dims=['time'])
    
# Mimic resampling with the AS-JUN anchored offset
years = da.time.dt.year - (da.time.dt.month < 6)
da['AS-JUN'] = xr.DataArray([datetime(year, 6, 1) for year in years], coords=da.time.coords)
resampled = da.groupby('AS-JUN').mean('time').rename({'AS-JUN': 'time'})

This gives the following for resampled:

<xarray.DataArray (time: 4)>
array([  2. ,  10.5,  22.5,  32. ])
Coordinates:
  * time     (time) object 0001-06-01 00:00:00 0002-06-01 00:00:00 ...

This is analogous to using resample(time='AS-JUN') with a DataArray indexed by a DatetimeIndex:

import pandas as pd
dates = pd.date_range('2002-01-01', freq='M', periods=36)
da = xr.DataArray(np.arange(len(dates)), coords=[dates], dims='time')
resampled = da.resample(time='AS-JUN').mean('time')

which gives:

<xarray.DataArray (time: 4)>
array([  2. ,  10.5,  22.5,  32. ])
Coordinates:
  * time     (time) datetime64[ns] 2001-06-01 2002-06-01 2003-06-01 2004-06-01

lvankampenhout · 2018-05-29T07:41:53Z

thanks for your elaborate response @spencerkclark

Do you happen to be using a PeriodIndex because of pandas Timestamp-limitations?

Yes, the main limitation being the limited range of years (~584) whereas my dataset spans 1800 years. Note that in glaciology, which deals with ice sheet responses over multiple millennia, this is considered a short period.

I elaborated a bit more on my problem in this issue which is in a unofficial repo, I realized too late.

Anyway, your code using cftime solves my problem 😄 indeed resampling to 'AS-JUN' is what I was looking for. Still, it would be nice to have better support for PeriodIndex in the future. It has costed me a lot of time figuring out what's going on and learning the details of all the different date & time implementations. Which is a waste in the end.

spencerahill · 2018-05-29T15:42:55Z

@lvankampenhout just FYI #2191 has been opened for further discussion of adding resample to CFTimeIndex. So keep an eye on that for those developments...as well as consider taking a stab at implementing it yourself! I'm sure @spencerkclark and others will be keen to help out once you (or somebody) gets started.

stale · 2020-04-28T16:04:19Z

In order to maintain a list of currently relevant issues, we mark issues as stale after a period of inactivity

If this issue remains relevant, please comment here or remove the stale label; otherwise it will be marked as closed automatically

lvankampenhout mentioned this issue May 19, 2018

xarray datetime limitations kuchaale/X-regression#6

Closed

spencerahill mentioned this issue May 28, 2018

Adding resample functionality to CFTimeIndex #2191

Closed

spencerkclark mentioned this issue Oct 13, 2018

Implement CFPeriodIndex #2481

Closed

max-sixty mentioned this issue Jan 17, 2019

Enable resampling on PeriodIndex #2687

Closed

3 tasks

stale bot added the stale label Apr 28, 2020

stale bot closed this as completed May 30, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

BUG: Resample on PeriodIndex not working? #1270

BUG: Resample on PeriodIndex not working? #1270

max-sixty commented Feb 15, 2017

fmaussion commented Feb 15, 2017

shoyer commented Feb 15, 2017

max-sixty commented Feb 15, 2017

spencerkclark commented Feb 15, 2017 •

edited

Loading

fmaussion commented Feb 17, 2017

lvankampenhout commented May 22, 2018

spencerkclark commented May 22, 2018

lvankampenhout commented May 29, 2018

spencerahill commented May 29, 2018

stale bot commented Apr 28, 2020

BUG: Resample on PeriodIndex not working? #1270

BUG: Resample on PeriodIndex not working? #1270

Comments

max-sixty commented Feb 15, 2017

fmaussion commented Feb 15, 2017

shoyer commented Feb 15, 2017

max-sixty commented Feb 15, 2017

spencerkclark commented Feb 15, 2017 • edited Loading

fmaussion commented Feb 17, 2017

lvankampenhout commented May 22, 2018

spencerkclark commented May 22, 2018

lvankampenhout commented May 29, 2018

spencerahill commented May 29, 2018

stale bot commented Apr 28, 2020

spencerkclark commented Feb 15, 2017 •

edited

Loading