You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I have checked that this issue has not already been reported.
I have confirmed this bug exists on the latest version of Polars.
Issue description
While building a larger project, I uncovered a memory bug (OOM) of some sort caused by a sequence of operations in both lazy and eager mode. This issue occurs when exploding array columns that have None's in them. However, it doesn't happen whenever there are None's present. It seems to occur after creating a new column of values using apply with a lambda or named function that returns a list where that list generates a None for some row, or via a join operation. If these null values are filtered out before calling explode on the column(s), there is no issue. If the nulls are not filtered out, the script appears to hang but the memory grows to consume all available memory resources. I confirmed this with the last few released version on pypi, the latest of which is 0.16.5. I can confirm that this issue occurs on both Windows and Linux on separate machines. The reproducible example I provided fails repeatably for me in both windows and linux. However, if I get rid of the column 'd' creation and just join, I can get it to fail in linux but not in windows (I was running a slightly older version of polars on windows 0.15.16).
This may be related/similar to a closed issue: #4108
I will update this issue with any additional findings as I explore this more. Thanks!
I would expect this to handle null cases by returning the Nones in the response (explode sometimes does this) or raising a shape error (which it does in other cases).
Polars version checks
I have checked that this issue has not already been reported.
I have confirmed this bug exists on the latest version of Polars.
Issue description
While building a larger project, I uncovered a memory bug (OOM) of some sort caused by a sequence of operations in both lazy and eager mode. This issue occurs when exploding array columns that have None's in them. However, it doesn't happen whenever there are None's present. It seems to occur after creating a new column of values using apply with a lambda or named function that returns a list where that list generates a None for some row, or via a join operation. If these null values are filtered out before calling explode on the column(s), there is no issue. If the nulls are not filtered out, the script appears to hang but the memory grows to consume all available memory resources. I confirmed this with the last few released version on pypi, the latest of which is 0.16.5. I can confirm that this issue occurs on both Windows and Linux on separate machines. The reproducible example I provided fails repeatably for me in both windows and linux. However, if I get rid of the column 'd' creation and just join, I can get it to fail in linux but not in windows (I was running a slightly older version of polars on windows 0.15.16).
This may be related/similar to a closed issue: #4108
I will update this issue with any additional findings as I explore this more. Thanks!
Reproducible example
Expected behavior
I would expect this to handle null cases by returning the Nones in the response (explode sometimes does this) or raising a shape error (which it does in other cases).
Installed versions
The text was updated successfully, but these errors were encountered: