-
-
Notifications
You must be signed in to change notification settings - Fork 1.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
map_elements replaces all array elements with nulls #17873
Comments
Can reproduce. It seems like it also happens with non-struct columns. It also seems to be specific to the first return value being null? pl.DataFrame({"a": [1, 2, 3]}).with_columns(pl.all().map_elements(
{1: [None], 2: [9.0], 3: [10.0]}.get,
return_dtype=pl.List(pl.Float64)
))
# shape: (3, 1)
# ┌───────────┐
# │ a │
# │ --- │
# │ list[f64] │
# ╞═══════════╡
# │ [null] │
# │ [null] │ # ???
# │ [null] │ # ???
# └───────────┘ pl.DataFrame({"a": [1, 2, 3]}).with_columns(pl.all().map_elements(
{1: [9.0], 2: [None], 3: [10.0]}.get,
return_dtype=pl.List(pl.Float64)
))
# shape: (3, 1)
# ┌───────────┐
# │ a │
# │ --- │
# │ list[f64] │
# ╞═══════════╡
# │ [9.0] │
# │ [null] │
# │ [10.0] │
# └───────────┘ |
A work around that would be much more performant (assumes your arrays column is really fixed width and that you don't mind using numpy) would be to do
Alternatively, an all polars no map_* approach that makes no assumptions but is slightly worse performance than numpy.
and yet another way (this one assumes the 'arrays' are fixed width again) but is the most performant of the 3
|
So it appears this is actually due to df1.with_columns(array_multi=pl.col("tmp_struct").map_elements(custom_map,
return_dtype=pl.List(pl.Float64),
skip_nulls=False)
)
# shape: (2, 4)
# ┌───────────┬─────────┬─────────────────┬──────────────────┐
# │ arrays ┆ numbers ┆ tmp_struct ┆ array_multi │
# │ --- ┆ --- ┆ --- ┆ --- │
# │ list[i64] ┆ f64 ┆ struct[2] ┆ list[f64] │
# ╞═══════════╪═════════╪═════════════════╪══════════════════╡
# │ [1, 2] ┆ null ┆ {[1, 2],null} ┆ [null, null] │
# │ [3, 4] ┆ 1000.0 ┆ {[3, 4],1000.0} ┆ [3000.0, 4000.0] │
# └───────────┴─────────┴─────────────────┴──────────────────┘ |
This is now fixed on main. |
Checks
Reproducible example
Log output
Issue description
When applying UDFs to structs AND the UDFs produce arrays AND some arrays are full of Nones, all other valid requests are also filled with Nones. There's a workaround of using
float('nan')
instead of None.Expected behavior
I expect the dataframe to look like this:
Installed versions
The text was updated successfully, but these errors were encountered: