Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Couldn't change values for on-disk datasets opened by open_mfdataset() (xarray > 0.8.2) #1805

Closed
AlexeyPechnikov opened this issue Jan 1, 2018 · 11 comments
Labels
needs mcve https://matthewrocklin.com/blog/work/2018/02/28/minimal-bug-reports stale

Comments

@AlexeyPechnikov
Copy link

It doesn't work:

test = ds.TEST
test.values.flags

  C_CONTIGUOUS : True
  F_CONTIGUOUS : False
  OWNDATA : True
  WRITEABLE : True
  ALIGNED : True
  UPDATEIFCOPY : False

test.values

  array([ 1.,  1.,  1.,  1.,  1.,  1.,  1.,  1.,  1.,  1.])

test.values = 0*test.values
test.values

  array([ 1.,  1.,  1.,  1.,  1.,  1.,  1.,  1.,  1.,  1.])
...

But it works:

test = xr.DataArray(np.ones(10),
                coords={
                    'x': range(10)
                },
                dims=['x']
    )
test.values

  array([ 1.,  1.,  1.,  1.,  1.,  1.,  1.,  1.,  1.,  1.])

test.values = 0*test.values
test.values

  array([ 0.,  0.,  0.,  0.,  0.,  0.,  0.,  0.,  0.,  0.])
@shoyer
Copy link
Member

shoyer commented Jan 1, 2018

Can you share full information for how you created this dataset? I cannot reproduce this:

In [36]: test = xr.DataArray(np.ones(10),
    ...:                 coords={
    ...:                     'x': range(10)
    ...:                 },
    ...:                 dims=['x']
    ...:     ).chunk()
    ...:

In [37]: test
Out[37]:
<xarray.DataArray (x: 10)>
dask.array<shape=(10,), dtype=float64, chunksize=(10,)>
Coordinates:
  * x        (x) int64 0 1 2 3 4 5 6 7 8 9

In [38]: test.values = 0 * test.values

In [39]: test.values
Out[39]: array([ 0.,  0.,  0.,  0.,  0.,  0.,  0.,  0.,  0.,  0.])

@shoyer shoyer added the needs mcve https://matthewrocklin.com/blog/work/2018/02/28/minimal-bug-reports label Jan 2, 2018
@AlexeyPechnikov
Copy link
Author

I use this dataset:
ftp://mit.ecco-group.org/ecco_for_las/version_4/release2/nctiles_monthly/TFLUX
My code:

import xarray as xr
ds = xr.open_mfdataset('./nctiles_monthly/TFLUX/TFLUX*.nc',concat_dim='tile')
TFLUX = ds.TFLUX
TFLUX.values[1,1,1] = 0*TFLUX.values[1,1,1]
TFLUX.values[1,1,1]

  array([ 60.46262741,  57.78710175,  56.33338547,  56.44807434,
        ...,
        76.84630585,  74.65901184], dtype=float32)

Also

xarray.show_versions()

INSTALLED VERSIONS
------------------
commit: None
python: 2.7.14.final.0
python-bits: 64
OS: Darwin
OS-release: 17.3.0
machine: x86_64
processor: i386
byteorder: little
LC_ALL: en_US.UTF-8
LANG: en_US.UTF-8
LOCALE: None.None

xarray: 0.10.0
pandas: 0.22.0
numpy: 1.13.3
scipy: 0.18.1
netCDF4: 1.2.6
h5netcdf: None
Nio: None
bottleneck: None
cyordereddict: None
dask: 0.16.0
matplotlib: 2.1.0
cartopy: None
seaborn: None
setuptools: 36.5.0
pip: 9.0.1
conda: None
pytest: 3.2.3
IPython: 5.1.0
sphinx: None

@shoyer
Copy link
Member

shoyer commented Jan 2, 2018

OK, I can reproduce this. The problem is assigning to specific elements of .values when a DataArray's data is not stored as a NumPy array, e.g., if it is a dask array or one of xarray's lazy array classes (used internally for lazy access when opening netCDF files). In practice, this arises with open_mfdataset() or open_dataset(..., cache=False).

It's not entirely obvious how to fix this. Probably the simplest fix is to make .values a readonly numpy array in these cases, so that an error is raised in these cases. This would usually works, but could lead to issues with passing arrays to scikit-learn or pandas due to a Cython bug with read-only memoryview buffers (cython/cython#1605).

@AlexeyPechnikov
Copy link
Author

It's works correct for xarray 0.8.2...

@AlexeyPechnikov
Copy link
Author

pip2 install xarray==0.8.2
import xarray as xr
ds = xr.open_mfdataset('./nctiles_monthly/TFLUX/TFLUX*.nc',concat_dim='tile')
TFLUX = ds.TFLUX
TFLUX.values[1,1,1] = 0*TFLUX.values[1,1,1]
TFLUX.values[1,1,1]

  array([ 0.,  0.,  0.,  0.,  0.,  0.,  0.,  0.,  0.,  0.,  0.,  0.,  0.,
        ...,
        0.,  0.,  0.,  0.,  0.,  0.,  0.,  0.,  0.,  0.,  0.,  0.], dtype=float32)

So old code is broken now.

@AlexeyPechnikov
Copy link
Author

In practice, this arises with open_mfdataset() or open_dataset() with cache=False.

Hm, I tested with "cache=True" but it still doesn't work.

@shoyer
Copy link
Member

shoyer commented Jan 2, 2018

Hm, I tested with "cache=True" but it still doesn't work.

I meant open_dataset(cache=False) (it arises with either value for cache for open_mfdataset).

So old code is broken now.

Yes, this is an unfortunate/unintended consequence of #1024 ("Disable automatic cache with dask").

@AlexeyPechnikov
Copy link
Author

Is it possible to fix the "cache" option for open_mfdataset()? And raise error in other case.

@AlexeyPechnikov
Copy link
Author

AlexeyPechnikov commented Jan 2, 2018

And in-memory dataarray couldn't be modified too:

import xarray as xr
ds = xr.open_mfdataset('./nctiles_monthly/TFLUX/TFLUX*.nc',concat_dim='tile')
TFLUX = ds.TFLUX.copy(deep=True)
TFLUX.values[1,1,1] = 0*TFLUX.values[1,1,1]
TFLUX.values[1,1,1]
  array([ 60.46262741,  57.78710175,  56.33338547,  56.44807434,
        ...,
        76.84630585,  74.65901184], dtype=float32)

But why?

@shoyer
Copy link
Member

shoyer commented Jan 2, 2018

@mobigroup copy(deep=True) makes an independent copy but again (as of #1024) does not load data into memory. You need to use .compute() or .load() to load the dask arrays into memory.

@stale
Copy link

stale bot commented Dec 3, 2019

In order to maintain a list of currently relevant issues, we mark issues as stale after a period of inactivity

If this issue remains relevant, please comment here or remove the stale label; otherwise it will be marked as closed automatically

@stale stale bot added the stale label Dec 3, 2019
@stale stale bot closed this as completed Jan 2, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
needs mcve https://matthewrocklin.com/blog/work/2018/02/28/minimal-bug-reports stale
Projects
None yet
Development

No branches or pull requests

2 participants