BUG: Buffer protocol doesn't support read-write access for numpy records #23973

fyellin · 2023-06-18T07:24:53Z

Describe the issue:

Numpy does not allow you to use the buffer-protocol to get a read-write buffer for numpy record. It requires you to ask for the record read-only. This is in spite of the fact that the pointer returned using the buffer protocol does, in fact, point to a read-write piece of memory and one ought to be able to modify it.

Using the buffer protocol is the recommended way to read and write "array-like objects" in C. To use the buffer protocol for a numpy record, you have to lie and request a read-only buffer, and then write to it anyway. The older array_interface works just fine.

Reproduce the code example:

>>> import numpy as np
>>> x = np.zeros(2, dtype='d,d,i')

# memoryview mimics the behavior of the buffer protocol
>>> memoryview(x).readonly
False
>>> memoryview(x[0]).readonly
True

On the other hand

>>> x.__array_interface__['data']   # returns tuple (address, read-only)
(105553143159680, False)
>>> x[0].array_interface__['data']
(105553143159680, False)

I also wrote some C code which I linked in and called from Python

void foo(PyObject *object) {
    Py_buffer view;
    PyObject_GetBuffer(object, &view, PyBUF_CONTIG_RO);
    if (view.obj) {
        double *data = (double *)view.buf;
        *data += 1;
    }
    PyBuffer_Release(&view);
}

both foo(x) and foo(x[0]) correctly implemented an element of the array. The buffer really does point do the actual data and not a copy of it.

Using PyBUF_CONTIG gives on error on foo(x[0]).

Error message:

No response

Runtime information:

numpy.version
'1.25.0'

sys.version
'3.10.2 (v3.10.2:a58ebcc701, Jan 13 2022, 14:50:16) [Clang 13.0.0 (clang-1300.0.29.30)]'
numpy.show_runtime()
[{'numpy_version': '1.25.0',
'python': '3.10.2 (v3.10.2:a58ebcc701, Jan 13 2022, 14:50:16) [Clang 13.0.0 '
'(clang-1300.0.29.30)]',
'uname': uname_result(system='Darwin', node='Franks-14-Mac-Book-Pro-2.local', release='22.4.0', version='Darwin Kernel Version 22.4.0: Mon Mar 6 20:59:28 PST 2023; root:xnu-8796.101.5~3/RELEASE_ARM64_T6000', machine='arm64')},
{'simd_extensions': {'baseline': ['NEON', 'NEON_FP16', 'NEON_VFPV4', 'ASIMD'],
'found': ['ASIMDHP', 'ASIMDDP'],
'not_found': ['ASIMDFHM']}},
{'architecture': 'armv8',
'filepath': '/Users/fy/Library/Python/3.10/lib/python/site-packages/numpy/.dylibs/libopenblas64_.0.dylib',
'internal_api': 'openblas',
'num_threads': 10,
'prefix': 'libopenblas',
'threading_layer': 'pthreads',
'user_api': 'blas',
'version': '0.3.23'}]

Context for the issue:

I should be able to use the buffer protocol to read and write a numpy record without lying about my intentions to write to it.

seberg · 2023-06-19T09:07:49Z

We can probably change this, although I don't like that our scalars are mutable this way; it also doesn't really make much sense to act differently in different places. (I suspect there may be ways to create structured scalars which do not allow to write, but I would have to search a bit.)

So, I am a bit curious about you needing to mutate the scalars (although, NumPy turning 0-D arrays to scalars might be enough of a reason).

I.e. the fact that we return a read-only version for the buffer protocol was maybe even intentional to some degree.
The main reason why its tricky to make the structured scalar immutable is that it would break arr[0]["field"] = value (which would otherwise only work as arr["field"][0] = value).

mattip · 2023-06-19T09:09:48Z

If I recall correctly there were issues around correctly handling the buffer protocol and padding (if needed).

fyellin · 2023-06-19T16:09:34Z

My use case is that I am creating a numpy record that looks like a C structure, and then passing that numpy record to C code. I need to find out the actual address of the numpy record in my SWIG wrapper to give to C.

Slightly longer history. All my numpy records are currently created as the sole element of a numpy array:

array = np.zeros(1, dtype="i,i,i,i,d,d,d"). # my actual descriptor is more complicated
record = array[0]

and in this case, I can get from a numpy record to its actual address by means of base:

PyObject *base = PyObject_getattr_string("base");  // get back the array
void* address = PyArray_DATA(base);

However I will soon be needing to handle arrays of records, and the above trick will only work with the 0-th element.

I can correct this by using the C side of the array_interface protocol:

    PyObject *capsule = PyObject_GetAttrString(object, "__array_struct__");
    PyArrayInterface* interface = PyCapsule_GetPointer(capsule, NULL);
    void *address = interface->data;
    Py_DECREF(capsule);

This works fine, and it correctly reports that the data is read-write. But the documentation says this protocol is "legacy", and that new code should use the buffer protocol. Hence my surprising discovery that:

    Py_buffer view;
    PyObject_GetBuffer(object, &view, PyBUF_CONTIG);
    void *address = (void *)view.buf;
    PyBuffer_Release(&view);  
    return result;

fails. If I lie to it and use PyBUF_CONTIG_RO, I can get the record address and write to it anyway.

Until this is fixed (or declined), can I ask for advice? Am I better off using the legacy array_interface or lying to the buffer protocol? Both feel dangerous in different ways.

eric-wieser · 2023-06-19T16:29:57Z

Does using this instead work?

record = np.zeros((), dtype="i,i,i,i,d,d,d"). # my actual descriptor is more complicated

This gives you an array rather than a scalar, which should be read-write.

fyellin · 2023-06-19T17:38:34Z

The numpy record is used a lot on both the C side and on the Python side. The bigger project is a Python wrapper around a NASA toolkit. It would be tricky explaining to users (most of them are astronomers, not programmers) that they have to type record[0].field* instead of the more natural record.field. The toolkit also wants to return arrays of records, and the user will pick one of them and pass it back to the toolkit.

But in general, there ought to be a natural way of modifying a record in C, shouldn't there?

[* Yes, I'm actually using rec.array. But that detail was irrelevant to the original bug report.]
Thanks for your help.

fyellin · 2023-06-20T22:22:39Z

Just double checking. Until this bug is resolved one way or the other, is there a recommended way to get the address of a record? Is there any intent to deprecate array_interface?

leofang · 2024-06-19T00:02:41Z

The fact that accessing a single element recarray[i] gives a scalar record is just so daunting… It should at least come with a ctypes attribute so as to access the pointer address. Right now I can't do arr[0].ctypes.data, so I have to do arr[0:1] to avoid getting a record back, since a (1-element) recarray does have a ctypes attribute for me to get the pointer address.

Also, the documentation for record.data is really misleading:
https://numpy.org/doc/stable/reference/generated/numpy.record.data.html#numpy.record.data
It says

Pointer to start of data

but actually it gives a memoryview.

fyellin added the 00 - Bug label Jun 18, 2023

fyellin changed the title ~~BUG: <Please write a comprehensive title after the 'BUG: ' prefix>~~ BUG: Buffer protocol doesn't support read-write access for numpy records Jun 19, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

BUG: Buffer protocol doesn't support read-write access for numpy records #23973

BUG: Buffer protocol doesn't support read-write access for numpy records #23973

fyellin commented Jun 18, 2023

seberg commented Jun 19, 2023

mattip commented Jun 19, 2023

fyellin commented Jun 19, 2023

eric-wieser commented Jun 19, 2023

fyellin commented Jun 19, 2023

fyellin commented Jun 20, 2023

leofang commented Jun 19, 2024 •

edited

Loading

BUG: Buffer protocol doesn't support read-write access for numpy records #23973

BUG: Buffer protocol doesn't support read-write access for numpy records #23973

Comments

fyellin commented Jun 18, 2023

Describe the issue:

Reproduce the code example:

Error message:

Runtime information:

Context for the issue:

seberg commented Jun 19, 2023

mattip commented Jun 19, 2023

fyellin commented Jun 19, 2023

eric-wieser commented Jun 19, 2023

fyellin commented Jun 19, 2023

fyellin commented Jun 20, 2023

leofang commented Jun 19, 2024 • edited Loading

leofang commented Jun 19, 2024 •

edited

Loading