Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

CFTimeIndex #1252

Merged
merged 75 commits into from
May 13, 2018
Merged

CFTimeIndex #1252

merged 75 commits into from
May 13, 2018

Conversation

spencerkclark
Copy link
Member

@spencerkclark spencerkclark commented Feb 6, 2017

This work in progress PR is a start on implementing a NetCDFTimeIndex, a subclass of pandas.Index, which closely mimics pandas.DatetimeIndex, but uses netcdftime._netcdftime.datetime objects. Currently implemented in the new index are:

This index is meant as a step towards improving the handling of non-standard calendars and dates outside the range Timestamp('1677-09-21 00:12:43.145225') to Timestamp('2262-04-11 23:47:16.854775807').


For now I have pushed only the code and some tests for the new index; I want to make sure the index is solid and well-tested before we consider integrating it into any of xarray's existing logic or writing any documentation.

Regarding the index, there are a couple remaining outstanding issues (that at least I'm aware of):

  1. Currently one can create non-sensical datetimes using netcdftime._netcdftime.datetime objects. This means one can attempt to index with an out-of-bounds string or datetime without raising an error. Could this possibly be addressed upstream? For example:
In [1]: from netcdftime import DatetimeNoLeap

In [2]: DatetimeNoLeap(2000, 45, 45)
Out[2]: netcdftime._netcdftime.DatetimeNoLeap(2000, 45, 45, 0, 0, 0, 0, -1, 1)
  1. I am looking to enable this index to be used in pandas.Series and pandas.DataFrame objects as well; this requires implementing a get_value method. I have taken @shoyer's suggested simplified approach from Towards a (temporary?) workaround for datetime issues at the xarray-level #1084 (comment), and tweaked it to also allow for slice indexing, so I think this is most of the way there. A remaining to-do for me, however, is to implement something to allow for integer-indexing outside of iloc, e.g. if you have a pandas.Series series, indexing with the syntax series[1] or series[1:3].

Hopefully this is a decent start; in particular I'm not an expert in writing tests so please let me know if there are improvements I can make to the structure and / or style I've used so far. I'm happy to make changes. I appreciate your help.

self.date_type(1, 2, 1)]),
self.da.sel(time=[True, True, False, False])
]:
self.assertDataArrayIdentical(result, expected)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These sorts of tests would be much more natural if you use pytest fixtures



@requires_netCDF4
class NetCDFTimeIndexTests(object):
Copy link
Collaborator

@max-sixty max-sixty Feb 6, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are you sure pytest runs these tests? I think it requires the class name to start with Test

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes the tests do run, though I agree I was a bit careless with the names of the classes:
https://travis-ci.org/pydata/xarray/jobs/198709337#L1877

I'll address that as I refactor things to take more advantage of pytest.

class DatetimeNoLeap(NetCDFTimeIndexTests, TestCase):
def set_date_type(self):
from netcdftime import DatetimeNoLeap
self.date_type = DatetimeNoLeap
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These would also be really easy as parameterized fixtures

@spencerkclark
Copy link
Member Author

spencerkclark commented Feb 6, 2017

Thanks for the quick feedback on the tests @MaximilianR. Is this on the right track for doing things a little more idiomatically with pytest?

import pytest

from xarray.tests import assert_array_equal


def netcdftime_date_types():
    pytest.importorskip('netCDF4')

    from netcdftime import (
        DatetimeNoLeap, DatetimeJulian, DatetimeAllLeap,
        DatetimeGregorian, DatetimeProlepticGregorian, Datetime360Day)
    return [DatetimeNoLeap, DatetimeJulian, DatetimeAllLeap,
            DatetimeGregorian, DatetimeProlepticGregorian, Datetime360Day]


@pytest.fixture(params=[])
def index(request):
    from xarray.core.netcdftimeindex import NetCDFTimeIndex

    date_type = request.param
    dates = [date_type(1, 1, 1), date_type(1, 2, 1),
             date_type(2, 1, 1), date_type(2, 2, 1)]
    return NetCDFTimeIndex(dates)


@pytest.mark.parametrize('index', netcdftime_date_types(), indirect=True)
@pytest.mark.parametrize(('field', 'expected'), [
    ('year', [1, 1, 2, 2]),
    ('month', [1, 2, 1, 2]),
    ('day', [1, 1, 1, 1]),
    ('hour', [0, 0, 0, 0]),
    ('minute', [0, 0, 0, 0]),
    ('second', [0, 0, 0, 0]),
    ('microsecond', [0, 0, 0, 0])
], ids=['year', 'month', 'day', 'hour', 'minute', 'second', 'microsecond'])
def test_netcdftimeindex_field_accessors(index, field, expected):
    result = getattr(index, field)
    assert_array_equal(result, expected)

@spencerkclark
Copy link
Member Author

@MaximilianR I think I'm getting the hang of it; ignore the above. I'll push a new update to the PR in a bit.

@max-sixty
Copy link
Collaborator

Not far off, although no need to use things like ids (or even indirect, although you can if you want for that).

More than happy to offer any guidance on tests if helpful - post an example. pytest is really nice, even if it takes a bit of time to get used to

@spencerkclark
Copy link
Member Author

@MaximilianR 6496458 contains an updated version of the file containing the tests, updated to use pytest. I ended up using the ids keyword in places to clean up the test names that are output when running the tests in verbose mode, but I agree it's not super necessary.

Overall pytest seems to clean things up pretty nicely. One problem that I wish I had a better solution for happens when I am testing indexing operations in DataArrays, Series, and DataFrames. There are multiple ways of getting the same answer, which makes these tests a good candidate for using pytest.mark.parametrize; however, one of those ways, using netcdftime._netcdftime.datetime objects directly, depends on the date_type used.

For instance, it would be great if I could write something like:

@pytest.mark.parametrize('sel_arg', [
    '0001',
    slice('0001-01-01', '0001-12-30'),
    [True, True, False, False],
    slice(date_type(1, 1, 1), date_type(1, 12, 30)),
    [date_type(1, 1, 1), date_type(1, 2, 1)]
], ids=['string', 'string-slice', 'bool-list', 'date-slice', 'date-list'])
def test_sel(da, index, sel_arg):
    expected = xr.DataArray([1, 2], coords=[index[:2]], dims=['time'])
    result = da.sel(time=sel_arg)
    assert_identical(result, expected)

But I can't use date_type, which is a fixture in my current setup, in an argument to parametrize. Right now I've worked around this by resorting back to manually iterating over the cases by writing separate methods, but that's pretty verbose; might you happen to know of a cleaner way of setting things up in this case?

In any event, when you get a chance, please let me know if you have any comments / suggestions on my latest push. Thanks again for your help.

import numpy as np
import pandas as pd

from pandas.lib import isscalar
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

V minor but there is an xarray version of this

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

And the pandas version isn't public API :)



def named(name, pattern):
return '(?P<' + name + '>' + pattern + ')'
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think .format is faster (as well as idiomatic) because this way will build n strings

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should only be called once, probably at module import time, so it should not matter for performance. I would just go with whatever is most readable.

@max-sixty
Copy link
Collaborator

That looks awesome! Quite a turn around there.

Yes, good point on the date_type. Let me think for a bit on the best way

Copy link
Member

@shoyer shoyer left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks like a very nice start!

Two limitations of the current design that are worth noting:

  1. It doesn't do resample
  2. It doesn't handle missing values

I don't think either of these are deal breakers

@@ -0,0 +1,180 @@
import re
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This shouldn't go in core, since there's nothing tying it to core xarray internals. Instead, it should probably go in a new top level module, maybe a new directory alongside the contents of the existing conventions module (rename it to xarray.conventions.coding?).



def named(name, pattern):
return '(?P<' + name + '>' + pattern + ')'
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should only be called once, probably at module import time, so it should not matter for performance. I would just go with whatever is most readable.

def parse_iso8601(datetime_string):
basic_pattern = build_pattern(date_sep='', time_sep='')
extended_pattern = build_pattern()
patterns = [basic_pattern, extended_pattern]
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Save this in as global variable.

for attr in ['year', 'month', 'day', 'hour', 'minute', 'second']:
value = result.get(attr, None)
if value is not None:
replace[attr] = int(value)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Note that seconds can be fractional

return default.replace(**replace), resolution


def _parsed_string_to_bounds(date_type, resolution, parsed):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Note this is based on a pandas function

@pytest.fixture
def feb_days(date_type):
from netcdftime import DatetimeAllLeap, Datetime360Day
if date_type == DatetimeAllLeap:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Use is for type identity checks.


expected = pd.Series([1, 2], index=index[:2])
for arg in range_args:
pd.util.testing.assert_series_equal(series[arg], expected)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Be careful. I don't think this is public API. Better to stick with the .equals.

import numpy as np
import pandas as pd

from pandas.lib import isscalar
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

And the pandas version isn't public API :)



def build_pattern(date_sep='\-', datetime_sep='T', time_sep='\:'):
pieces = [(None, 'year', '\d{4}'),
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you need negative or five digit years?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Personally, I don't see myself needing it in the near future, but I'm not necessarily opposed to adding that support if others would find it useful.

It would make writing simple positive four-digit year dates more complicated though right? Would you always need the leading zero and the sign?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Then let's not bother until someone asks. Per Wikipedia's ISO 8601 you can optionally use an expanded year representation with + and -. I don't think they would always be necessary but I haven't read the original document (which unfortunately I think is not available only).

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

FYI NCAR's TraCE simulation project is a 21k yr paleoclimate simulation. Not sure how they handle calendars/times. I know somebody who has analyzed data from this simulation; will ask what it looks like.

'minute', 'minute-dash', 'second', 'second-dash', 'second-dec',
'second-dec-dash'])
def test_parse_iso8601(string, expected):
from xarray.core.netcdftimeindex import parse_iso8601
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do your imports at the top level if at all possible.

@shoyer
Copy link
Member

shoyer commented Feb 7, 2017

Currently one can create non-sensical datetimes using netcdftime._netcdftime.datetime objects. This means one can attempt to index with an out-of-bounds string or datetime without raising an error. Could this possibly be addressed upstream?

Yes, this would be best addressed upstream in netcdftime.

@spencerkclark
Copy link
Member Author

@shoyer thanks for your initial review comments. I'll try and push an update in the next few days.

Copy link
Member Author

@spencerkclark spencerkclark left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@shoyer I have just pushed an update. Whenever you get a chance please have another look.

I have a few comments in-line, but more broadly please let me know if I handled moving netcdftimeindex.py out of core and into a new conventions directory the way you wanted. Thanks!

'The microseconds of the datetime')
date_type = property(get_date_type)

def _partial_date_slice(self, resolution, parsed):
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For now I tried to go as simple as possible here and in _get_string_slice. I think trying to exactly mimic DatetimeIndex's behavior could get messy.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

could you add a few examples (either here or in the docstring) that describe what behavior is not covered in this implementation.

try:
result = self.get_loc(key)
return (is_scalar(result) or type(result) == slice or
(isinstance(result, np.ndarray) and result.size))
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Essentially all I want to do here is, if result is a numpy array, check if it is not empty. Is there a cleaner way to do this?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this is about the best you can do

return dict(year=year, month=month, day=day, hour=hour,
minute=minute, second=second)

ISO8601_STRING_TESTS = [
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this along the lines of what you were looking for here? I couldn't find a name for the second argument to pytest.mark.parametrize, so I wasn't sure how to combine these two list arguments into a dict (but maybe I misunderstood what you were asking for).

@@ -35,10 +36,12 @@ def build_pattern(date_sep='\-', datetime_sep='T', time_sep='\:'):
return '^' + trailing_optional(pattern_list) + '$'


basic_pattern = build_pattern(date_sep='', time_sep='')
extended_pattern = build_pattern()
patterns = [basic_pattern, extended_pattern]
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Use all caps for global constants, and preface for an underscore to indicate that they are private variables, e.g., _BASIC_PATTERN

@@ -54,13 +57,22 @@ def _parse_iso8601_with_reso(date_type, timestr):
for attr in ['year', 'month', 'day', 'hour', 'minute', 'second']:
value = result.get(attr, None)
if value is not None:
# Note ISO8601 conventions allow for fractional seconds; casting
# to an int means all seconds values get rounded down to the
# nearest integer. TODO: Consider adding support for sub-second
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

you should update the regex above to exclude fractional seconds if that doesn't work


if not isinstance(data[0], datetime):
raise TypeError(
'NetCDFTimeIndex requires netcdftime._netcdftime.datetime'
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Use the public API name netcdftime.datetime.

Also, print the invalid object in the error message (using .format)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Unfortunately the public API name actually represents a DatetimeProlepticGregorian type, so for now to stick with public API imports, I've resorted to importing all six of the netcdftime datetime types.

In [1]: from netcdftime import datetime

In [2]: datetime(1, 1, 1)
Out[2]: netcdftime._netcdftime.DatetimeProlepticGregorian(1, 1, 1, 0, 0, 0, 0, -1, 1)

raise TypeError(
'NetCDFTimeIndex requires netcdftime._netcdftime.datetime'
' objects.')
if not all(isinstance(value, type(data[0])) for value in data):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Create a variable for type(data[0]) outside the loop.

if not all(isinstance(value, type(data[0])) for value in data):
raise TypeError(
'NetCDFTimeIndex requires using netcdftime._netcdftime.datetime'
' objects of all the same type.')
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same concerns as above on the error message

if not isinstance(data[0], datetime):
raise TypeError(
'NetCDFTimeIndex requires netcdftime._netcdftime.datetime'
' objects.')
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: I usually prefer to leave spaces at the lines instead of the the start of lines -- I think it looks slightly nicer.

def test_parse_iso8601(string, expected):
from xarray.core.netcdftimeindex import parse_iso8601
]
ISO8601_STRING_TEST_IDS = [
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My thinking here was to use something like a dict, just to logically join together test names and parameters in the same place, e.g.,

ISO8601_STRING_TESTS = {
    'year': ('1999', date_dict(year='1999')),
    'month': ('199901', date_dict(year='1999', month='01')),
    ...
}

@pytest.mark.parmetrize(('string', 'expected', ISO8601_STRING_TESTS.values(),
                        ids=ISO8601_STRING_TESTS.keys())

try:
result = self.get_loc(key)
return (is_scalar(result) or type(result) == slice or
(isinstance(result, np.ndarray) and result.size))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this is about the best you can do

Copy link
Member

@shoyer shoyer left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, the conventions module is exactly what I was thinking

return result
"""Adapted from pandas.tseries.index.DatetimeIndex.get_loc"""
if isinstance(key, pycompat.basestring):
return self._get_string_slice(key)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1 for fewer hard to predict special cases. Pandas is really inscrutable here.

expected = xr.DataArray(1).assign_coords(time=index[0])
result = da.sel(time=date_type(1, 1, 1))
assert_identical(result, expected)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Add some tests for sel (both scalar and list) with both method='pad' and method='nearest' and optionally setting tolerance

@spencerkclark
Copy link
Member Author

@shoyer when you get a chance, things are ready for another review. I think the AppVeyor issues may be due to the version of netCDF4 used. Should we switch to the conda-forge channel to set up the environment there?

def assert_all_same_netcdftime_datetimes(data):
from netcdftime._netcdftime import datetime
def assert_all_valid_date_type(data):
from netcdftime import (
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You can just use datetime here -- these are all subclasses, so isinstance checks on the super class work fine.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry, I buried this in a comment (#1252 (comment)) above. Confusingly, netcdftime.datetime does not refer to the super class:

In [1]: from netcdftime import datetime, DatetimeAllLeap

In [2]: datetime(1, 1, 1)
Out[2]: netcdftime._netcdftime.DatetimeProlepticGregorian(1, 1, 1, 0, 0, 0, 0, -1, 1)

In [3]: test = DatetimeAllLeap(1, 1, 1)

In [4]: isinstance(test, datetime)
Out[4]: False

In [5]: from netcdftime._netcdftime import datetime as super_datetime

In [6]: isinstance(test, super_datetime)
Out[6]: True

'NetCDFTimeIndex requires netcdftime._netcdftime.datetime'
' objects.')
if not all(isinstance(value, type(data[0])) for value in data):
'NetCDFTimeIndex requires netcdftime._netcdftime.datetime '
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I still prefer to use netcdftime.datetime, since it's equivalent and doesn't refer to a private submodule (the same reason we don't reference xarray.core.dataset.Dataset). The repr for netcdftime datetime should probably be fixed to refer to the shorter path.

@shoyer
Copy link
Member

shoyer commented Feb 12, 2017 via email

I must have inadvertently removed it during a merge.
@spencerkclark
Copy link
Member Author

In that case, we should probably add a temporary "pip" clause to the requirements file for windows, to install cftime from pypi instead for now.

Thanks @shoyer, it looks like that did the trick. I addressed your recent comments; let me know if you have any further feedback.

@shoyer
Copy link
Member

shoyer commented May 2, 2018

See also #2098, which should fix failing builds on master

@jhamman
Copy link
Member

jhamman commented May 11, 2018

Tests are green here now. @shoyer and @spencerkclark - are we waiting on anything else before merging?

@spencerkclark
Copy link
Member Author

This could be ready. I'm happy to address any further concerns if anyone has them.

else:
sample = var.data.ravel()[0]
if isinstance(sample, dask_array_type):
sample = sample.compute()
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Evaluating dask arrays makes me cringe, but I think this is about the best we can currently do with NumPy's current dtype system. Fortunately this should not be common, anyways.

except ImportError:
return False
else:
sample = var.data.ravel()[0]
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since this could be potentially called on many arrays, let's be a little more careful before calculating sample:

  1. Let's verify that the array has dtype=object (otherwise it can't contain cftime.datetime objects)
  2. Let's verify that the array has size > 0 before trying any elements.

def assert_all_valid_date_type(data):
import cftime

valid_types = (cftime.DatetimeJulian, cftime.DatetimeNoLeap,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This can probably be simplified to just the base cftime.datetime?

@shoyer
Copy link
Member

shoyer commented May 12, 2018

A couple other things to think about from a usability perspective:

  • What happens when you try to resample along CFTimeIndex?
  • What happens when you try to plot a DataArray with a CFTimeIndex?

These should at least raise informative errors (use NotImplementedError).

@spencerkclark
Copy link
Member Author

Thanks @shoyer, those are good questions. I addressed your inline comments. Let me know if you have anything else.

What happens when you try to resample along CFTimeIndex?

Through pandas, this raises an informative TypeError:

In [1]: from cftime import num2date

In [2]: import numpy as np

In [3]: import xarray as xr

In [4]: xr.set_options(enable_cftimeindex=True)
Out[4]: <xarray.core.options.set_options at 0x10c3a99d0>

In [5]: time = num2date(np.arange(5), units='days since 0001-01-01', calendar='noleap')

In [6]: data = xr.DataArray(np.arange(5), coords=[time], dims=['time'])

In [7]: data.resample(time='2D').mean()
---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
<ipython-input-8-577095ede89a> in <module>()
----> 1 data.resample(time='2D').mean()

/Users/spencerclark/xarray-dev/xarray/xarray/core/common.pyc in resample(self, freq, dim, how, skipna, closed, label, base, keep_attrs, **indexer)
    616         resampler = self._resample_cls(self, group=group, dim=dim_name,
    617                                        grouper=grouper,
--> 618                                        resample_dim=RESAMPLE_DIM)
    619
    620         return resampler

/Users/spencerclark/xarray-dev/xarray/xarray/core/resample.pyc in __init__(self, *args, **kwargs)
    128                              "cannot have the same name as actual dimension "
    129                              "('{}')! ".format(self._resample_dim, self._dim))
--> 130         super(DataArrayResample, self).__init__(*args, **kwargs)
    131
    132     def apply(self, func, shortcut=False, **kwargs):

/Users/spencerclark/xarray-dev/xarray/xarray/core/groupby.pyc in __init__(self, obj, group, squeeze, grouper, bins, cut_kwargs)
    233                 raise ValueError('index must be monotonic for resampling')
    234             s = pd.Series(np.arange(index.size), index)
--> 235             first_items = s.groupby(grouper).first()
    236             full_index = first_items.index
    237             if first_items.isnull().any():

//anaconda/envs/xarray-dev/lib/python2.7/site-packages/pandas/core/generic.pyc in groupby(self, by, axis, level, as_index, sort, group_keys, squeeze, **kwargs)
   5160         return groupby(self, by=by, axis=axis, level=level, as_index=as_index,
   5161                        sort=sort, group_keys=group_keys, squeeze=squeeze,
-> 5162                        **kwargs)
   5163
   5164     def asfreq(self, freq, method=None, how=None, normalize=False,

//anaconda/envs/xarray-dev/lib/python2.7/site-packages/pandas/core/groupby.pyc in groupby(obj, by, **kwds)
   1846         raise TypeError('invalid type: %s' % type(obj))
   1847
-> 1848     return klass(obj, by, **kwds)
   1849
   1850

//anaconda/envs/xarray-dev/lib/python2.7/site-packages/pandas/core/groupby.pyc in __init__(self, obj, keys, axis, level, grouper, exclusions, selection, as_index, sort, group_keys, squeeze, **kwargs)
    514                                                     level=level,
    515                                                     sort=sort,
--> 516                                                     mutated=self.mutated)
    517
    518         self.obj = obj

//anaconda/envs/xarray-dev/lib/python2.7/site-packages/pandas/core/groupby.pyc in _get_grouper(obj, key, axis, level, sort, mutated, validate)
   2848     # a passed-in Grouper, directly convert
   2849     if isinstance(key, Grouper):
-> 2850         binner, grouper, obj = key._get_grouper(obj, validate=False)
   2851         if key.key is None:
   2852             return grouper, [], obj

//anaconda/envs/xarray-dev/lib/python2.7/site-packages/pandas/core/resample.pyc in _get_grouper(self, obj, validate)
   1118     def _get_grouper(self, obj, validate=True):
   1119         # create the resampler and return our binner
-> 1120         r = self._get_resampler(obj)
   1121         r._set_binner()
   1122         return r.binner, r.grouper, r.obj

//anaconda/envs/xarray-dev/lib/python2.7/site-packages/pandas/core/resample.pyc in _get_resampler(self, obj, kind)
   1114         raise TypeError("Only valid with DatetimeIndex, "
   1115                         "TimedeltaIndex or PeriodIndex, "
-> 1116                         "but got an instance of %r" % type(ax).__name__)
   1117
   1118     def _get_grouper(self, obj, validate=True):

TypeError: Only valid with DatetimeIndex, TimedeltaIndex or PeriodIndex, but got an instance of 'CFTimeIndex'

What happens when you try to plot a DataArray with a CFTimeIndex?

I updated things such that if cftime.datetime objects are used as a coordinate when plotting, the error message looks like:

In [1]: from cftime import num2date

In [2]: import numpy as np

In [3]: import xarray as xr

In [4]: xr.set_options(enable_cftimeindex=True)
Out[4]: <xarray.core.options.set_options at 0x10af58850>

In [5]: time = num2date(np.arange(5), units='days since 0001-01-01', calendar='noleap')

In [6]: data = xr.DataArray(np.arange(5), coords=[time], dims=['time'])

In [7]: data.plot()
---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
<ipython-input-7-118f55e0b3d0> in <module>()
----> 1 data.plot()

/Users/spencerclark/xarray-dev/xarray/xarray/plot/plot.pyc in __call__(self, **kwargs)
    357
    358     def __call__(self, **kwargs):
--> 359         return plot(self._da, **kwargs)
    360
    361     @functools.wraps(hist)

/Users/spencerclark/xarray-dev/xarray/xarray/plot/plot.pyc in plot(darray, row, col, col_wrap, ax, rtol, subplot_kws, **kwargs)
    155     kwargs['ax'] = ax
    156
--> 157     return plotfunc(darray, **kwargs)
    158
    159

/Users/spencerclark/xarray-dev/xarray/xarray/plot/plot.pyc in line(darray, *args, **kwargs)
    258             yplt = darray.coords[ylabel]
    259
--> 260     _ensure_plottable(xplt)
    261
    262     primitive = ax.plot(xplt, yplt, *args, **kwargs)

/Users/spencerclark/xarray-dev/xarray/xarray/plot/plot.pyc in _ensure_plottable(*args)
     54         if not (_valid_numpy_subdtype(np.array(x), numpy_types) or
     55                 _valid_other_type(np.array(x), other_types)):
---> 56             raise TypeError('Plotting requires coordinates to be numeric '
     57                             'or dates of type np.datetime64 or '
     58                             'datetime.datetime.')

TypeError: Plotting requires coordinates to be numeric or dates of type np.datetime64 or datetime.datetime.

If cftime.datetime objects are the data requested to be plotted, the following error message results:

In [8]: data = xr.DataArray(time, coords=[np.arange(5)], dims=['x'])

In [9]: data.plot()
---------------------------------------------------------------------------
NotImplementedError                       Traceback (most recent call last)
<ipython-input-9-118f55e0b3d0> in <module>()
----> 1 data.plot()

/Users/spencerclark/xarray-dev/xarray/xarray/plot/plot.pyc in __call__(self, **kwargs)
    357
    358     def __call__(self, **kwargs):
--> 359         return plot(self._da, **kwargs)
    360
    361     @functools.wraps(hist)

/Users/spencerclark/xarray-dev/xarray/xarray/plot/plot.pyc in plot(darray, row, col, col_wrap, ax, rtol, subplot_kws, **kwargs)
    124
    125     if contains_cftime_datetimes(darray):
--> 126         raise NotImplementedError('Plotting arrays of cftime.datetime objects '
    127                                   'is currently not possible.')
    128

NotImplementedError: Plotting arrays of cftime.datetime objects is currently not possible.

@shoyer
Copy link
Member

shoyer commented May 12, 2018

OK, I'm happy with this. Time to merge I guess?

@fmaussion
Copy link
Member

Congrats! This is a great piece of work and will be very useful to the climate community.

@jhamman
Copy link
Member

jhamman commented May 13, 2018

Okay, I'm going to merge now. Hopefully a few of us can stress test this a bit more prior to the next release. Thanks @spencerkclark for all the work here over the past 15 months!!!

@jhamman jhamman merged commit ebe0dd0 into pydata:master May 13, 2018
@spencerkclark
Copy link
Member Author

@shoyer, @jhamman, @maxim-lian, @spencerahill many thanks for the substantial feedback, help, and encouragement here. You guys are great!

@spencerkclark spencerkclark deleted the NetCDFTimeIndex branch May 13, 2018 11:32
@rabernat
Copy link
Contributor

Congrats to everyone who made this happen, especially @spencerclark. This feature is going to make so many people happy!

@spencerahill
Copy link
Contributor

Credit also due to @rabernat for organizing the workshop in late 2016 where this effort got off the ground, and to @shoyer who sketched out an initial roadmap for the implementation at that meeting.

So excited to have this in! In aospy alone, we'll be able to get rid of 100s (1000+?) of lines of code now that CFTime is in place.

@spencerkclark
Copy link
Member Author

Indeed that meeting played an important role here. Thank you @rabernat!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Towards a (temporary?) workaround for datetime issues at the xarray-level
10 participants