Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

BUG: Buffer protocol doesn't support read-write access for numpy records #23973

Open
fyellin opened this issue Jun 18, 2023 · 7 comments
Open
Labels

Comments

@fyellin
Copy link

fyellin commented Jun 18, 2023

Describe the issue:

Numpy does not allow you to use the buffer-protocol to get a read-write buffer for numpy record. It requires you to ask for the record read-only. This is in spite of the fact that the pointer returned using the buffer protocol does, in fact, point to a read-write piece of memory and one ought to be able to modify it.

Using the buffer protocol is the recommended way to read and write "array-like objects" in C. To use the buffer protocol for a numpy record, you have to lie and request a read-only buffer, and then write to it anyway. The older array_interface works just fine.

Reproduce the code example:

>>> import numpy as np
>>> x = np.zeros(2, dtype='d,d,i')

# memoryview mimics the behavior of the buffer protocol
>>> memoryview(x).readonly
False
>>> memoryview(x[0]).readonly
True

On the other hand

>>> x.__array_interface__['data']   # returns tuple (address, read-only)
(105553143159680, False)
>>> x[0].array_interface__['data']
(105553143159680, False)

I also wrote some C code which I linked in and called from Python

void foo(PyObject *object) {
    Py_buffer view;
    PyObject_GetBuffer(object, &view, PyBUF_CONTIG_RO);
    if (view.obj) {
        double *data = (double *)view.buf;
        *data += 1;
    }
    PyBuffer_Release(&view);
}

both foo(x) and foo(x[0]) correctly implemented an element of the array. The buffer really does point do the actual data and not a copy of it.

Using PyBUF_CONTIG gives on error on foo(x[0]).

Error message:

No response

Runtime information:

numpy.version
'1.25.0'

sys.version
'3.10.2 (v3.10.2:a58ebcc701, Jan 13 2022, 14:50:16) [Clang 13.0.0 (clang-1300.0.29.30)]'
numpy.show_runtime()
[{'numpy_version': '1.25.0',
'python': '3.10.2 (v3.10.2:a58ebcc701, Jan 13 2022, 14:50:16) [Clang 13.0.0 '
'(clang-1300.0.29.30)]',
'uname': uname_result(system='Darwin', node='Franks-14-Mac-Book-Pro-2.local', release='22.4.0', version='Darwin Kernel Version 22.4.0: Mon Mar 6 20:59:28 PST 2023; root:xnu-8796.101.5~3/RELEASE_ARM64_T6000', machine='arm64')},
{'simd_extensions': {'baseline': ['NEON', 'NEON_FP16', 'NEON_VFPV4', 'ASIMD'],
'found': ['ASIMDHP', 'ASIMDDP'],
'not_found': ['ASIMDFHM']}},
{'architecture': 'armv8',
'filepath': '/Users/fy/Library/Python/3.10/lib/python/site-packages/numpy/.dylibs/libopenblas64_.0.dylib',
'internal_api': 'openblas',
'num_threads': 10,
'prefix': 'libopenblas',
'threading_layer': 'pthreads',
'user_api': 'blas',
'version': '0.3.23'}]

Context for the issue:

I should be able to use the buffer protocol to read and write a numpy record without lying about my intentions to write to it.

@fyellin fyellin changed the title BUG: <Please write a comprehensive title after the 'BUG: ' prefix> BUG: Buffer protocol doesn't support read-write access for numpy records Jun 19, 2023
@seberg
Copy link
Member

seberg commented Jun 19, 2023

We can probably change this, although I don't like that our scalars are mutable this way; it also doesn't really make much sense to act differently in different places. (I suspect there may be ways to create structured scalars which do not allow to write, but I would have to search a bit.)

So, I am a bit curious about you needing to mutate the scalars (although, NumPy turning 0-D arrays to scalars might be enough of a reason).

I.e. the fact that we return a read-only version for the buffer protocol was maybe even intentional to some degree.
The main reason why its tricky to make the structured scalar immutable is that it would break arr[0]["field"] = value (which would otherwise only work as arr["field"][0] = value).

@mattip
Copy link
Member

mattip commented Jun 19, 2023

If I recall correctly there were issues around correctly handling the buffer protocol and padding (if needed).

@fyellin
Copy link
Author

fyellin commented Jun 19, 2023

My use case is that I am creating a numpy record that looks like a C structure, and then passing that numpy record to C code. I need to find out the actual address of the numpy record in my SWIG wrapper to give to C.

Slightly longer history. All my numpy records are currently created as the sole element of a numpy array:

array = np.zeros(1, dtype="i,i,i,i,d,d,d"). # my actual descriptor is more complicated
record = array[0]

and in this case, I can get from a numpy record to its actual address by means of base:

PyObject *base = PyObject_getattr_string("base");  // get back the array
void* address = PyArray_DATA(base);

However I will soon be needing to handle arrays of records, and the above trick will only work with the 0-th element.

I can correct this by using the C side of the array_interface protocol:

    PyObject *capsule = PyObject_GetAttrString(object, "__array_struct__");
    PyArrayInterface* interface = PyCapsule_GetPointer(capsule, NULL);
    void *address = interface->data;
    Py_DECREF(capsule);

This works fine, and it correctly reports that the data is read-write. But the documentation says this protocol is "legacy", and that new code should use the buffer protocol. Hence my surprising discovery that:

    Py_buffer view;
    PyObject_GetBuffer(object, &view, PyBUF_CONTIG);
    void *address = (void *)view.buf;
    PyBuffer_Release(&view);  
    return result;

fails. If I lie to it and use PyBUF_CONTIG_RO, I can get the record address and write to it anyway.

Until this is fixed (or declined), can I ask for advice? Am I better off using the legacy array_interface or lying to the buffer protocol? Both feel dangerous in different ways.

@eric-wieser
Copy link
Member

Does using this instead work?

record = np.zeros((), dtype="i,i,i,i,d,d,d"). # my actual descriptor is more complicated

This gives you an array rather than a scalar, which should be read-write.

@fyellin
Copy link
Author

fyellin commented Jun 19, 2023

The numpy record is used a lot on both the C side and on the Python side. The bigger project is a Python wrapper around a NASA toolkit. It would be tricky explaining to users (most of them are astronomers, not programmers) that they have to type record[0].field* instead of the more natural record.field. The toolkit also wants to return arrays of records, and the user will pick one of them and pass it back to the toolkit.

But in general, there ought to be a natural way of modifying a record in C, shouldn't there?

[* Yes, I'm actually using rec.array. But that detail was irrelevant to the original bug report.]
Thanks for your help.

@fyellin
Copy link
Author

fyellin commented Jun 20, 2023

Just double checking. Until this bug is resolved one way or the other, is there a recommended way to get the address of a record? Is there any intent to deprecate array_interface?

@leofang
Copy link
Contributor

leofang commented Jun 19, 2024

The fact that accessing a single element recarray[i] gives a scalar record is just so daunting… It should at least come with a ctypes attribute so as to access the pointer address. Right now I can't do arr[0].ctypes.data, so I have to do arr[0:1] to avoid getting a record back, since a (1-element) recarray does have a ctypes attribute for me to get the pointer address.

Also, the documentation for record.data is really misleading:
https://numpy.org/doc/stable/reference/generated/numpy.record.data.html#numpy.record.data
It says

Pointer to start of data

but actually it gives a memoryview.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

5 participants