You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I'm loading large images in map_groups. I expected my program to run, but I got Arrow errors:
Traceback (most recent call last):
File "/home/ray/default/1.py", line 18, in <module>
ds.take(1)
File "/home/ray/anaconda3/lib/python3.9/site-packages/ray/data/dataset.py", line 2377, in take
for row in limited_ds.iter_rows():
File "/home/ray/anaconda3/lib/python3.9/site-packages/ray/data/iterator.py", line 241, in _wrapped_iterator
for batch in batch_iterable:
File "/home/ray/anaconda3/lib/python3.9/site-packages/ray/data/iterator.py", line 162, in _create_iterator
block_iterator, stats, blocks_owned_by_consumer = self._to_block_iterator()
File "/home/ray/anaconda3/lib/python3.9/site-packages/ray/data/_internal/iterator/iterator_impl.py", line 33, in _to_block_iterator
block_iterator, stats, executor = ds._plan.execute_to_iterator()
File "/home/ray/anaconda3/lib/python3.9/site-packages/ray/data/exceptions.py", line 86, in handle_trace
raise e.with_traceback(None) from SystemException()
ray.exceptions.RayTaskError(AssertionError): ray::MapBatches(group_fn)() (pid=67455, ip=10.0.26.25)
File "/home/ray/anaconda3/lib/python3.9/site-packages/ray/data/_internal/output_buffer.py", line 94, in next
block_remainder = block.slice(
File "/home/ray/anaconda3/lib/python3.9/site-packages/ray/data/_internal/arrow_block.py", line 246, in slice
view = _copy_table(view)
File "/home/ray/anaconda3/lib/python3.9/site-packages/ray/data/_internal/arrow_block.py", line 685, in _copy_table
return transform_pyarrow.combine_chunks(table)
File "/home/ray/anaconda3/lib/python3.9/site-packages/ray/data/_internal/arrow_ops/transform_pyarrow.py", line 295, in combine_chunks
arr = _concatenate_extension_column(col)
File "/home/ray/anaconda3/lib/python3.9/site-packages/ray/air/util/transform_pyarrow.py", line 34, in _concatenate_extension_column
return ArrowTensorArray._concat_same_type(ca.chunks)
File "/home/ray/anaconda3/lib/python3.9/site-packages/ray/air/util/tensor_extensions/arrow.py", line 551, in _concat_same_type
storage = pa.concat_arrays([c.storage for c in to_concat])
File "pyarrow/array.pxi", line 3321, in pyarrow.lib.concat_arrays
File "pyarrow/error.pxi", line 144, in pyarrow.lib.pyarrow_internal_check_status
File "pyarrow/error.pxi", line 100, in pyarrow.lib.check_status
pyarrow.lib.ArrowInvalid: offset overflow while concatenating arrays
importrayimportnumpyasnpdefcreate_large_data(group):
# Each result is 128 MiBreturn {"item": np.zeros((1, 128*1024*1024), dtype=np.uint8)}
ds= (
ray.data.range(1024, override_num_blocks=1)
.groupby(key="id")
.map_groups(create_large_data)
)
ds.take(1)
Issue Severity
High: It blocks me from completing my task.
The text was updated successfully, but these errors were encountered:
bveeramani
added
bug
Something that is supposed to be working; but isn't
P1
Issue that should be fixed within a few weeks
data
Ray Data-related issues
labels
Apr 19, 2024
bveeramani
changed the title
[Data] ArrowInvalid: offset overflow when calling Dataset.map_groups
[Data] ArrowInvalid: offset overflow when calling Dataset.map_groups()Apr 19, 2024
What happened + What you expected to happen
I'm loading large images in
map_groups
. I expected my program to run, but I got Arrow errors:Versions / Dependencies
fe4dd5d
Reproduction script
Issue Severity
High: It blocks me from completing my task.
The text was updated successfully, but these errors were encountered: