Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Divide each element in a list by an int #14711

Closed
Liyixin95 opened this issue Feb 27, 2024 · 5 comments
Closed

Divide each element in a list by an int #14711

Liyixin95 opened this issue Feb 27, 2024 · 5 comments
Labels
enhancement New feature or an improvement of an existing feature

Comments

@Liyixin95
Copy link

Description

I have a dataframe with a list column and an int column, then I want to divide these two columns like this:

import polars as pl
from polars import col

lf = pl.DataFrame({"a": [[1, 2, 3], [1, 3], [2, 4, 6]], "b": [4, 5, 6]}).lazy()

df = lf.with_columns(c=(col("a") / col("b")).alias("c")).collect()

print(df)

then I get an error:

Traceback (most recent call last):
  File "/root/miniconda3/envs/factor/lib/python3.10/runpy.py", line 196, in _run_module_as_main
    return _run_code(code, main_globals, None,
  File "/root/miniconda3/envs/factor/lib/python3.10/runpy.py", line 86, in _run_code
    exec(code, run_globals)
  File "/root/python/ring_next/test/test.py", line 6, in <module>
    df = lf.with_columns(c=(col("a") / col("b")).alias("c")).collect()
  File "/root/miniconda3/envs/factor/lib/python3.10/site-packages/polars/lazyframe/frame.py", line 1937, in collect
    return wrap_df(ldf.collect())
polars.exceptions.ComputeError: cannot cast List type (inner: 'Int64', to: 'Float64')
@Liyixin95 Liyixin95 added the enhancement New feature or an improvement of an existing feature label Feb 27, 2024
@cmdlineluser
Copy link
Contributor

To date, I have found:

It's allowed on structs, but you lose the list:

df.with_columns(
   pl.col("a").list.to_struct("max_width") / pl.struct("b")
)

# shape: (3, 2)
# ┌─────────────────────────┬─────┐
# │ a                       ┆ b   │
# │ ---                     ┆ --- │
# │ struct[3]               ┆ i64 │
# ╞═════════════════════════╪═════╡
# │ {0.25,0.5,0.75}         ┆ 4   │
# │ {0.2,0.6,null}          ┆ 5   │
# │ {0.333333,0.666667,1.0} ┆ 6   │
# └─────────────────────────┴─────┘

"Manual" broadcasting:

df.with_columns(
   (pl.col("a").flatten() / pl.col("b")).implode().over(pl.int_range(pl.len()))
)

# shape: (3, 2)
# ┌───────────────────────────┬─────┐
# │ a                         ┆ b   │
# │ ---                       ┆ --- │
# │ list[f64]                 ┆ i64 │
# ╞═══════════════════════════╪═════╡
# │ [0.25, 0.5, 0.75]         ┆ 4   │
# │ [0.2, 0.6]                ┆ 5   │
# │ [0.333333, 0.666667, 1.0] ┆ 6   │
# └───────────────────────────┴─────┘

Does anybody know if this is planned for list or maybe array types?

@mcrumiller
Copy link
Contributor

Related is #14541. I definitely think simple arithmetic should be allowed on lists/arrays, and inter-series arithmetic on arrays of the same dtype.

@itamarst
Copy link
Contributor

I will start on this once #17823 is merged.

@deanm0000
Copy link
Collaborator

This is a narrow case of #17496

@cmdlineluser
Copy link
Contributor

This now works #19162

df = pl.DataFrame({"a": [[1, 2, 3], [1, 3], [2, 4, 6]], "b": [4, 5, 6]})

df.with_columns(c=(pl.col.a / pl.col.b))

# shape: (3, 3)
# ┌───────────┬─────┬───────────────────────────┐
# │ a         ┆ b   ┆ c                         │
# │ ---       ┆ --- ┆ ---                       │
# │ list[i64] ┆ i64 ┆ list[f64]                 │
# ╞═══════════╪═════╪═══════════════════════════╡
# │ [1, 2, 3] ┆ 4   ┆ [0.25, 0.5, 0.75]         │
# │ [1, 3]    ┆ 5   ┆ [0.2, 0.6]                │
# │ [2, 4, 6] ┆ 6   ┆ [0.333333, 0.666667, 1.0] │
# └───────────┴─────┴───────────────────────────┘

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or an improvement of an existing feature
Projects
None yet
6 participants