[BUG] Unexpected OOM writing a DataFrame w/ strings to ORC files #7588

randerzander · 2021-03-12T23:25:38Z

On a 32GB V100, the below snippet completes successfully with the RAPIDS 0.18 release, but fails w/ an OOM in the latest 0.19 nightly:

import cudf

df = cudf.datasets.randomdata(nrows=20_000_000)
df['teststr'] = 'teststr'
df.to_orc('test.orc', compression='snappy')

Trace:

---------------------------------------------------------------------------
MemoryError                               Traceback (most recent call last)
<ipython-input-1-33735a6b5c55> in <module>
      3 df = cudf.datasets.randomdata(nrows=20_000_000)
      4 df['teststr'] = 'teststr'
----> 5 df.to_orc('test.orc', compression='snappy')

~/conda/envs/rapids-gpu-bdb/lib/python3.7/site-packages/cudf/core/dataframe.py in to_orc(self, fname, compression, *args, **kwargs)
   7396         from cudf.io import orc as orc
   7397 
-> 7398         orc.to_orc(self, fname, compression, *args, **kwargs)
   7399 
   7400     def stack(self, level=-1, dropna=True):

~/conda/envs/rapids-gpu-bdb/lib/python3.7/site-packages/cudf/io/orc.py in to_orc(df, fname, compression, enable_statistics, **kwargs)
    329             liborc.write_orc(df, file_obj, compression, enable_statistics)
    330     else:
--> 331         liborc.write_orc(df, path_or_buf, compression, enable_statistics)
    332 
    333 

cudf/_lib/orc.pyx in cudf._lib.orc.write_orc()

cudf/_lib/orc.pyx in cudf._lib.orc.write_orc()

MemoryError: std::bad_alloc: CUDA error at: /home/rgelhausen/conda/envs/rapids-gpu-bdb/include/rmm/mr/device/cuda_memory_resource.hpp:69: cudaErrorMemoryAllocation out of memory

Interestingly, w/ nrows=100_000_000 doesn't OOM.

The text was updated successfully, but these errors were encountered:

@vuule

Closes #7588 The stream size used to be calculated incorrectly, leading to huge allocation for the encoded data buffer. This PR fixes the stream size computation to count each row group only once. Authors: - Vukasin Milovanovic (@vuule) Approvers: - Ram (Ramakrishna Prabhu) (@rgsl888prabhu) - Kumar Aatish (@kaatish) - Devavret Makkar (@devavret) URL: #7605

randerzander added bug Something isn't working cuIO cuIO issue labels Mar 12, 2021

vuule assigned vuule and rgsl888prabhu Mar 15, 2021

vuule mentioned this issue Mar 16, 2021

Fix ORC writer OOM issue #7605

Merged

rapids-bot bot closed this as completed in #7605 Mar 17, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[BUG] Unexpected OOM writing a DataFrame w/ strings to ORC files #7588

[BUG] Unexpected OOM writing a DataFrame w/ strings to ORC files #7588

randerzander commented Mar 12, 2021 •

edited

Loading

[BUG] Unexpected OOM writing a DataFrame w/ strings to ORC files #7588

[BUG] Unexpected OOM writing a DataFrame w/ strings to ORC files #7588

Comments

randerzander commented Mar 12, 2021 • edited Loading

randerzander commented Mar 12, 2021 •

edited

Loading