Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Stephan's sprintbattical #12

Merged
merged 45 commits into from
Feb 21, 2014
Merged

Stephan's sprintbattical #12

merged 45 commits into from
Feb 21, 2014

Conversation

shoyer
Copy link
Member

@shoyer shoyer commented Feb 14, 2014

No description provided.

Now O(n) instead of O(n^2), even though we do have to iterate through every
element of an array in Python (gasp!).
Refactored the `broadcast_variables` function and fixed a bug in
`variable.transpose` (see the new test case).
Added better test that uses string labels instead of just integers.
I think these names are much more straightforward. The only annoying
aspect is that "array" is the name of a built-in module, which conflicts
with naming the module "Array" is in "array".
We should probably remove Array.aggregate to reduce confusion, but for now
I'll keep aggregate for error checks.
@ebrevdo
Copy link
Contributor

ebrevdo commented Feb 14, 2014

Looks like a great change! I'm seeing some failing tests:

~/dev/scidata (DataView)$ python setup.py test
running test
running egg_info
writing requirements to src/xray.egg-info/requires.txt
writing src/xray.egg-info/PKG-INFO
writing top-level names to src/xray.egg-info/top_level.txt
writing dependency_links to src/xray.egg-info/dependency_links.txt
reading manifest file 'src/xray.egg-info/SOURCES.txt'
writing manifest file 'src/xray.egg-info/SOURCES.txt'
running build_ext
test_1d_math (test.test_array.TestArray) ... ok
test_aggregate (test.test_array.TestArray) ... ERROR
test_array_interface (test.test_array.TestArray) ... ok
test_broadcasting_failures (test.test_array.TestArray) ... ok
test_broadcasting_math (test.test_array.TestArray) ... ok
test_collapse (test.test_array.TestArray) ... ok
test_data (test.test_array.TestArray) ... ok
test_from_stack (test.test_array.TestArray) ... ok
test_groupby (test.test_array.TestArray) ... ERROR
test_indexed_by (test.test_array.TestArray) ... ok
test_inplace_math (test.test_array.TestArray) ... ok
test_items (test.test_array.TestArray) ... ok
test_properties (test.test_array.TestArray) ... ok
test_repr (test.test_array.TestArray) ... ok
test_transpose (test.test_array.TestArray) ... ok
test_attributes (test.test_dataset.DataTest) ... SKIP: attribute checks are not yet backend specific
test_coordinate (test.test_dataset.DataTest) ... ok
test_copy (test.test_dataset.DataTest) ... ok
test_dimension (test.test_dataset.DataTest) ... ok
test_getitem (test.test_dataset.DataTest) ... ERROR
test_indexed_by (test.test_dataset.DataTest) ... ok
test_init (test.test_dataset.DataTest) ... ok
test_iterator (test.test_dataset.DataTest) ... ok
test_labeled_by (test.test_dataset.DataTest) ... ERROR
test_merge (test.test_dataset.DataTest) ... ok
test_rename (test.test_dataset.DataTest) ... ok
test_repr (test.test_dataset.DataTest) ... ok
test_select (test.test_dataset.DataTest) ... ok
test_setitem (test.test_dataset.DataTest) ... ok
test_to_dataframe (test.test_dataset.DataTest) ... ok
test_unselect (test.test_dataset.DataTest) ... SKIP: need to write this test
test_variable (test.test_dataset.DataTest) ... ok
test_variable_indexing (test.test_dataset.DataTest) ... ok
test_write_store (test.test_dataset.DataTest) ... ok
test_attributes (test.test_dataset.NetCDF4DataTest) ... SKIP: attribute checks are not yet backend specific
test_coordinate (test.test_dataset.NetCDF4DataTest) ... ok
test_copy (test.test_dataset.NetCDF4DataTest) ... ok
test_dimension (test.test_dataset.NetCDF4DataTest) ... ok
test_dump_and_open_dataset (test.test_dataset.NetCDF4DataTest) ... ok
test_getitem (test.test_dataset.NetCDF4DataTest) ... ERROR
test_indexed_by (test.test_dataset.NetCDF4DataTest) ... ok
test_init (test.test_dataset.NetCDF4DataTest) ... ok
test_iterator (test.test_dataset.NetCDF4DataTest) ... ok
test_labeled_by (test.test_dataset.NetCDF4DataTest) ... ERROR
test_merge (test.test_dataset.NetCDF4DataTest) ... ok
test_rename (test.test_dataset.NetCDF4DataTest) ... ok
test_repr (test.test_dataset.NetCDF4DataTest) ... ok
test_select (test.test_dataset.NetCDF4DataTest) ... ok
test_setitem (test.test_dataset.NetCDF4DataTest) ... ok
test_to_dataframe (test.test_dataset.NetCDF4DataTest) ... ok
test_unselect (test.test_dataset.NetCDF4DataTest) ... SKIP: need to write this test
test_variable (test.test_dataset.NetCDF4DataTest) ... ok
test_variable_indexing (test.test_dataset.NetCDF4DataTest) ... ok
test_write_store (test.test_dataset.NetCDF4DataTest) ... ok
test_attributes (test.test_dataset.ScipyDataTest) ... SKIP: attribute checks are not yet backend specific
test_coordinate (test.test_dataset.ScipyDataTest) ... ok
test_copy (test.test_dataset.ScipyDataTest) ... ok
test_dimension (test.test_dataset.ScipyDataTest) ... ok
test_dump_and_open_dataset (test.test_dataset.ScipyDataTest) ... FAIL
test_getitem (test.test_dataset.ScipyDataTest) ... ERROR
test_indexed_by (test.test_dataset.ScipyDataTest) ... ok
test_init (test.test_dataset.ScipyDataTest) ... ok
test_iterator (test.test_dataset.ScipyDataTest) ... ok
test_labeled_by (test.test_dataset.ScipyDataTest) ... ERROR
test_merge (test.test_dataset.ScipyDataTest) ... ok
test_rename (test.test_dataset.ScipyDataTest) ... ok
test_repr (test.test_dataset.ScipyDataTest) ... ok
test_select (test.test_dataset.ScipyDataTest) ... ok
test_setitem (test.test_dataset.ScipyDataTest) ... ok
test_to_dataframe (test.test_dataset.ScipyDataTest) ... ok
test_unselect (test.test_dataset.ScipyDataTest) ... SKIP: need to write this test
test_variable (test.test_dataset.ScipyDataTest) ... ok
test_variable_indexing (test.test_dataset.ScipyDataTest) ... ok
test_write_store (test.test_dataset.ScipyDataTest) ... ok
test.test_dataset.create_test_data ... ok
test_aggregate (test.test_dataset_array.TestDatasetArray) ... FAIL
test_array_interface (test.test_dataset_array.TestDatasetArray) ... ok
test_collapse (test.test_dataset_array.TestDatasetArray) ... ok
test_dataset_getitem (test.test_dataset_array.TestDatasetArray) ... ok
test_from_stack (test.test_dataset_array.TestDatasetArray) ... ok
test_groupby (test.test_dataset_array.TestDatasetArray) ... FAIL
test_indexed_by (test.test_dataset_array.TestDatasetArray) ... ok
test_inplace_math (test.test_dataset_array.TestDatasetArray) ... ok
test_intersection (test.test_dataset_array.TestDatasetArray) ... FAIL
test_item_math (test.test_dataset_array.TestDatasetArray) ... ok
test_items (test.test_dataset_array.TestDatasetArray) ... ok
test_iteration (test.test_dataset_array.TestDatasetArray) ... ok
test_labeled_by (test.test_dataset_array.TestDatasetArray) ... FAIL
test_loc (test.test_dataset_array.TestDatasetArray) ... FAIL
test_math (test.test_dataset_array.TestDatasetArray) ... ok
test_properties (test.test_dataset_array.TestDatasetArray) ... ok
test_refocus (test.test_dataset_array.TestDatasetArray) ... ok
test_renamed (test.test_dataset_array.TestDatasetArray) ... ok
test_frozen (test.test_utils.TestDictionaries) ... ok
test_ordered_dict_intersection (test.test_utils.TestDictionaries) ... ok
test_safe (test.test_utils.TestDictionaries) ... ok
test_unsafe (test.test_utils.TestDictionaries) ... ok
test_expanded_indexer (test.test_utils.TestIndexers) ... ok
test_orthogonal_indexer (test.test_utils.TestIndexers) ... ok
test (test.test_utils.TestNum2DatetimeIndex) ... ERROR

ERROR: test_aggregate (test.test_array.TestArray)

Traceback (most recent call last):
File "/Users/ebrevdo/dev/scidata/test/test_array.py", line 240, in test_aggregate
self.assertVarEqual(expected_unique, actual_unique)
File "/Users/ebrevdo/dev/scidata/test/init.py", line 10, in assertVarEqual
self.assertTrue(utils.variable_equal(v1, v2))
File "/Users/ebrevdo/dev/scidata/src/xray/utils.py", line 132, in variable_equal
return np.array_equal(data1, data2)
File "/System/Library/Frameworks/Python.framework/Versions/2.7/Extras/lib/python/numpy/core/numeric.py", line 1977, in array_equal
return bool(logical_and.reduce(equal(a1,a2).ravel()))
AttributeError: 'NotImplementedType' object has no attribute 'ravel'

ERROR: test_groupby (test.test_array.TestArray)

Traceback (most recent call last):
File "/Users/ebrevdo/dev/scidata/test/test_array.py", line 220, in test_groupby
self.assertVarEqual(expected_unique, grouped.unique_coord)
File "/Users/ebrevdo/dev/scidata/test/init.py", line 10, in assertVarEqual
self.assertTrue(utils.variable_equal(v1, v2))
File "/Users/ebrevdo/dev/scidata/src/xray/utils.py", line 132, in variable_equal
return np.array_equal(data1, data2)
File "/System/Library/Frameworks/Python.framework/Versions/2.7/Extras/lib/python/numpy/core/numeric.py", line 1977, in array_equal
return bool(logical_and.reduce(equal(a1,a2).ravel()))
AttributeError: 'NotImplementedType' object has no attribute 'ravel'

ERROR: test_getitem (test.test_dataset.DataTest)

Traceback (most recent call last):
File "/Users/ebrevdo/dev/scidata/test/test_dataset.py", line 325, in test_getitem
{'units': 'days since 2000-01-01'})
File "/Users/ebrevdo/dev/scidata/src/xray/dataset.py", line 464, in create_variable
return self.add_variable(name, v)
File "/Users/ebrevdo/dev/scidata/src/xray/dataset.py", line 534, in add_variable
return self.set_variable(name, var)
File "/Users/ebrevdo/dev/scidata/src/xray/dataset.py", line 595, in set_variable
self.indices.build_index(name)
File "/Users/ebrevdo/dev/scidata/src/xray/dataset.py", line 109, in build_index
self.cache[key] = self.dataset._create_index(key)
File "/Users/ebrevdo/dev/scidata/src/xray/dataset.py", line 224, in _create_index
attr.get('calendar'))
File "/Users/ebrevdo/dev/scidata/src/xray/utils.py", line 106, in num2datetimeindex
dates = first_time_delta * num_delta + np.datetime64(first_dates[0])
TypeError: ufunc 'multiply' not supported for the input types, and the inputs could not be safely coerced to any supported types according to the casting rule 'safe'

ERROR: test_labeled_by (test.test_dataset.DataTest)

Traceback (most recent call last):
File "/Users/ebrevdo/dev/scidata/test/test_dataset.py", line 234, in test_labeled_by
{'units': 'days since 2000-01-01'})
File "/Users/ebrevdo/dev/scidata/src/xray/dataset.py", line 464, in create_variable
return self.add_variable(name, v)
File "/Users/ebrevdo/dev/scidata/src/xray/dataset.py", line 534, in add_variable
return self.set_variable(name, var)
File "/Users/ebrevdo/dev/scidata/src/xray/dataset.py", line 595, in set_variable
self.indices.build_index(name)
File "/Users/ebrevdo/dev/scidata/src/xray/dataset.py", line 109, in build_index
self.cache[key] = self.dataset._create_index(key)
File "/Users/ebrevdo/dev/scidata/src/xray/dataset.py", line 224, in _create_index
attr.get('calendar'))
File "/Users/ebrevdo/dev/scidata/src/xray/utils.py", line 106, in num2datetimeindex
dates = first_time_delta * num_delta + np.datetime64(first_dates[0])
TypeError: ufunc 'multiply' not supported for the input types, and the inputs could not be safely coerced to any supported types according to the casting rule 'safe'

ERROR: test_getitem (test.test_dataset.NetCDF4DataTest)

Traceback (most recent call last):
File "/Users/ebrevdo/dev/scidata/test/test_dataset.py", line 325, in test_getitem
{'units': 'days since 2000-01-01'})
File "/Users/ebrevdo/dev/scidata/src/xray/dataset.py", line 464, in create_variable
return self.add_variable(name, v)
File "/Users/ebrevdo/dev/scidata/src/xray/dataset.py", line 534, in add_variable
return self.set_variable(name, var)
File "/Users/ebrevdo/dev/scidata/src/xray/dataset.py", line 595, in set_variable
self.indices.build_index(name)
File "/Users/ebrevdo/dev/scidata/src/xray/dataset.py", line 109, in build_index
self.cache[key] = self.dataset._create_index(key)
File "/Users/ebrevdo/dev/scidata/src/xray/dataset.py", line 224, in _create_index
attr.get('calendar'))
File "/Users/ebrevdo/dev/scidata/src/xray/utils.py", line 106, in num2datetimeindex
dates = first_time_delta * num_delta + np.datetime64(first_dates[0])
TypeError: ufunc 'multiply' not supported for the input types, and the inputs could not be safely coerced to any supported types according to the casting rule 'safe'

ERROR: test_labeled_by (test.test_dataset.NetCDF4DataTest)

Traceback (most recent call last):
File "/Users/ebrevdo/dev/scidata/test/test_dataset.py", line 234, in test_labeled_by
{'units': 'days since 2000-01-01'})
File "/Users/ebrevdo/dev/scidata/src/xray/dataset.py", line 464, in create_variable
return self.add_variable(name, v)
File "/Users/ebrevdo/dev/scidata/src/xray/dataset.py", line 534, in add_variable
return self.set_variable(name, var)
File "/Users/ebrevdo/dev/scidata/src/xray/dataset.py", line 595, in set_variable
self.indices.build_index(name)
File "/Users/ebrevdo/dev/scidata/src/xray/dataset.py", line 109, in build_index
self.cache[key] = self.dataset._create_index(key)
File "/Users/ebrevdo/dev/scidata/src/xray/dataset.py", line 224, in _create_index
attr.get('calendar'))
File "/Users/ebrevdo/dev/scidata/src/xray/utils.py", line 106, in num2datetimeindex
dates = first_time_delta * num_delta + np.datetime64(first_dates[0])
TypeError: ufunc 'multiply' not supported for the input types, and the inputs could not be safely coerced to any supported types according to the casting rule 'safe'

ERROR: test_getitem (test.test_dataset.ScipyDataTest)

Traceback (most recent call last):
File "/Users/ebrevdo/dev/scidata/test/test_dataset.py", line 325, in test_getitem
{'units': 'days since 2000-01-01'})
File "/Users/ebrevdo/dev/scidata/src/xray/dataset.py", line 464, in create_variable
return self.add_variable(name, v)
File "/Users/ebrevdo/dev/scidata/src/xray/dataset.py", line 534, in add_variable
return self.set_variable(name, var)
File "/Users/ebrevdo/dev/scidata/src/xray/dataset.py", line 595, in set_variable
self.indices.build_index(name)
File "/Users/ebrevdo/dev/scidata/src/xray/dataset.py", line 109, in build_index
self.cache[key] = self.dataset._create_index(key)
File "/Users/ebrevdo/dev/scidata/src/xray/dataset.py", line 224, in _create_index
attr.get('calendar'))
File "/Users/ebrevdo/dev/scidata/src/xray/utils.py", line 106, in num2datetimeindex
dates = first_time_delta * num_delta + np.datetime64(first_dates[0])
TypeError: ufunc 'multiply' not supported for the input types, and the inputs could not be safely coerced to any supported types according to the casting rule 'safe'

ERROR: test_labeled_by (test.test_dataset.ScipyDataTest)

Traceback (most recent call last):
File "/Users/ebrevdo/dev/scidata/test/test_dataset.py", line 234, in test_labeled_by
{'units': 'days since 2000-01-01'})
File "/Users/ebrevdo/dev/scidata/src/xray/dataset.py", line 464, in create_variable
return self.add_variable(name, v)
File "/Users/ebrevdo/dev/scidata/src/xray/dataset.py", line 534, in add_variable
return self.set_variable(name, var)
File "/Users/ebrevdo/dev/scidata/src/xray/dataset.py", line 595, in set_variable
self.indices.build_index(name)
File "/Users/ebrevdo/dev/scidata/src/xray/dataset.py", line 109, in build_index
self.cache[key] = self.dataset._create_index(key)
File "/Users/ebrevdo/dev/scidata/src/xray/dataset.py", line 224, in _create_index
attr.get('calendar'))
File "/Users/ebrevdo/dev/scidata/src/xray/utils.py", line 106, in num2datetimeindex
dates = first_time_delta * num_delta + np.datetime64(first_dates[0])
TypeError: ufunc 'multiply' not supported for the input types, and the inputs could not be safely coerced to any supported types according to the casting rule 'safe'

ERROR: test (test.test_utils.TestNum2DatetimeIndex)

Traceback (most recent call last):
File "/Users/ebrevdo/dev/scidata/test/test_utils.py", line 68, in test
actual = utils.num2datetimeindex(num_dates, units, calendar)
File "/Users/ebrevdo/dev/scidata/src/xray/utils.py", line 106, in num2datetimeindex
dates = first_time_delta * num_delta + np.datetime64(first_dates[0])
TypeError: ufunc 'multiply' not supported for the input types, and the inputs could not be safely coerced to any supported types according to the casting rule 'safe'

FAIL: test_dump_and_open_dataset (test.test_dataset.ScipyDataTest)

Traceback (most recent call last):
File "/Users/ebrevdo/dev/scidata/test/test_dataset.py", line 404, in test_dump_and_open_dataset
self.assertEquals(expected, actual)
AssertionError: <xray.Dataset (time: 1000, @dim1: 100, @dim2: 50, @dim3: 10): var1 var2 var3> != <xray.Dataset (@dim2: 50, @dim3: 10, @dim1: 100, time: 1000): var1 var3 var2>

FAIL: test_aggregate (test.test_dataset_array.TestDatasetArray)

Traceback (most recent call last):
File "/Users/ebrevdo/dev/scidata/test/test_dataset_array.py", line 213, in test_aggregate
self.assertViewEqual(expected, actual)
File "/Users/ebrevdo/dev/scidata/test/test_dataset_array.py", line 9, in assertViewEqual
self.assertEqual(dv1.dataset, dv2.dataset)
AssertionError: <xray.Dataset (@x: 10, @abc: 3): foo> != <xray.Dataset (@x: 10, @abc: 3): foo>

FAIL: test_groupby (test.test_dataset_array.TestDatasetArray)

Traceback (most recent call last):
File "/Users/ebrevdo/dev/scidata/test/test_dataset_array.py", line 189, in test_groupby
grouped.collapse(np.sum, dimension=None))
File "/Users/ebrevdo/dev/scidata/test/test_dataset_array.py", line 9, in assertViewEqual
self.assertEqual(dv1.dataset, dv2.dataset)
AssertionError: <xray.Dataset (@abc: 3): foo> != <xray.Dataset (@abc: 3): foo>

FAIL: test_intersection (test.test_dataset_array.TestDatasetArray)

Traceback (most recent call last):
File "/Users/ebrevdo/dev/scidata/test/test_dataset_array.py", line 240, in test_intersection
self.assertViewEqual(dv1, self.dv[:5])
File "/Users/ebrevdo/dev/scidata/test/test_dataset_array.py", line 9, in assertViewEqual
self.assertEqual(dv1.dataset, dv2.dataset)
AssertionError: <xray.Dataset (@x: 5, @y: 20): foo> != <xray.Dataset (@x: 5, @y: 20): foo>

FAIL: test_labeled_by (test.test_dataset_array.TestDatasetArray)

Traceback (most recent call last):
File "/Users/ebrevdo/dev/scidata/test/test_dataset_array.py", line 75, in test_labeled_by
self.assertViewEqual(self.dv, self.dv.labeled_by(x=slice(None)))
File "/Users/ebrevdo/dev/scidata/test/test_dataset_array.py", line 9, in assertViewEqual
self.assertEqual(dv1.dataset, dv2.dataset)
AssertionError: <xray.Dataset (@x: 10, @y: 20): foo> != <xray.Dataset (@x: 10, @y: 20): foo>

FAIL: test_loc (test.test_dataset_array.TestDatasetArray)

Traceback (most recent call last):
File "/Users/ebrevdo/dev/scidata/test/test_dataset_array.py", line 81, in test_loc
self.assertViewEqual(self.dv[:3], self.dv.loc[:'c'])
File "/Users/ebrevdo/dev/scidata/test/test_dataset_array.py", line 9, in assertViewEqual
self.assertEqual(dv1.dataset, dv2.dataset)
AssertionError: <xray.Dataset (@x: 3, @y: 20): foo> != <xray.Dataset (@x: 3, @y: 20): foo>


Ran 100 tests in 1.248s

FAILED (failures=6, errors=9, skipped=6)

@shoyer
Copy link
Member Author

shoyer commented Feb 15, 2014

Thanks for @ebrevdo for taking a look! I'm pretty sure the issue here is the numpy version. I am running numpy 1.8 on my machine. It looks like we need at least numpy 1.7 for timedelta math [1], and it appears that array_equal can only compare string arrays in numpy 1.8 [2]. It's not insurmountable to work around either of these issues, but for I will increment the required version of numpy to 1.8.

[1] http://pandas.pydata.org/pandas-docs/stable/timeseries.html#numpy-1-7-compatibility
[2] numpy/numpy#2686

We need this for our current use of np.array_equal.
@ebrevdo
Copy link
Contributor

ebrevdo commented Feb 15, 2014

Thanks for looking at that. I'll do a more thorough evaluation over the
weekend!

On Fri, Feb 14, 2014 at 5:49 PM, Stephan Hoyer notifications@github.comwrote:

Thanks for @ebrevdo https://github.com/ebrevdo for taking a look! I'm
pretty sure the issue here is the numpy version. I am running numpy 1.8 on
my machine. It looks like we need at least numpy 1.7 for timedelta math
[1], and it appears that array_equal can only compare string arrays in
numpy 1.8 [2]. It's not insurmountable to work around either of these
issues, but for I will increment the required version of numpy to 1.8.

[1]
http://pandas.pydata.org/pandas-docs/stable/timeseries.html#numpy-1-7-compatibility
[2] numpy/numpy#2686 numpy/numpy#2686

Reply to this email directly or view it on GitHubhttps://github.com//pull/12#issuecomment-35144276
.

@shoyer shoyer mentioned this pull request Feb 15, 2014
@@ -0,0 +1,572 @@
import functools
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As we just discussed lets rename this to xarray.py.

@akleeman
Copy link
Contributor

This is all great. I've been experimenting with this branch and the majority of it is running fine. Given that this project is still under heavy development and rather than bloating this pull request, lets go ahead and merge it into master and iterate on top of it.

akleeman added a commit that referenced this pull request Feb 21, 2014
@akleeman akleeman merged commit 6e0b12c into master Feb 21, 2014
@shoyer
Copy link
Member Author

shoyer commented Feb 21, 2014

FYI -- I just pushed a commit renaming "Array" to "XArray" to master. There
are probably still a few lingering references to clean up...

On Thu, Feb 20, 2014 at 4:36 PM, akleeman notifications@github.com wrote:

Merged #12 #12.

Reply to this email directly or view it on GitHubhttps://github.com//pull/12
.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants