Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

create list using over (implode) context expected behaviour? bug? #9780

Closed
Julian-J-S opened this issue Jul 8, 2023 · 3 comments
Closed
Labels
enhancement New feature or an improvement of an existing feature

Comments

@Julian-J-S
Copy link
Contributor

Problem description

Hi,
My goal is to create a list column from a over aggregation.

I wonder if the current behaviour of a plain pl.col("a").over("b") is expected because it does nothing (useless?) which is different from the groupby + agg context where it produces a list column.
If it is expected (would love to know why), then I would expect the implode to work but it does not.

Examples:

df = pl.DataFrame({
    "a": [1, 2, 3],
    "b": [4, 4, 6],
})

# << over >>
df.with_columns(
    # is this the expected behavior?? useless? Maybe this should already create a list like in agg context?
    a_over_b=pl.col("a").over("b"),

    # crashes! (imo either this or better the above should work and create a list)
    # a_implode_over_b=pl.col("a").implode().over("b"),

    # I want this but over b not over all
    a_implode=pl.col("a").implode(),
)
┌─────┬─────┬──────────┬───────────┐
│ aba_over_ba_implode │
│ ------------       │
│ i64i64i64list[i64] │
╞═════╪═════╪══════════╪═══════════╡
│ 141        ┆ [1, 2, 3] │
│ 242        ┆ [1, 2, 3] │
│ 363        ┆ [1, 2, 3] │
└─────┴─────┴──────────┴───────────┘

# << groupby + agg >>
df.groupby("b").agg(
    # plain col("a") already creates a list in agg context (makes sense!)
    agg_col=pl.col("a"),

    # implode creates an additional list
    agg_col_imploded=pl.col("a").implode(), # implode adds another list
)
┌─────┬───────────┬──────────────────┐
│ bagg_colagg_col_imploded │
│ ---------              │
│ i64list[i64] ┆ list[list[i64]]  │
╞═════╪═══════════╪══════════════════╡
│ 6   ┆ [3]       ┆ [[3]]            │
│ 4   ┆ [1, 2]    ┆ [[1, 2]]         │
└─────┴───────────┴──────────────────┘

# << GOAL >>
┌─────┬─────┬───────────┐
│ abgoal      │
│ ---------       │
│ i64i64list[i64] │
╞═════╪═════╪═══════════╡
│ 14   ┆ [1, 2]    │
│ 24   ┆ [1, 2]    │
│ 36   ┆ [3]       │
└─────┴─────┴───────────┘
@Julian-J-S Julian-J-S added the enhancement New feature or an improvement of an existing feature label Jul 8, 2023
@cmdlineluser
Copy link
Contributor

As for the goal, you can use mapping_strategy #8967 (comment)

df.with_columns(goal =
    pl.col("a").over("b", mapping_strategy="join")
)
# shape: (3, 3)
# ┌─────┬─────┬───────────┐
# │ a   ┆ b   ┆ goal      │
# │ --- ┆ --- ┆ ---       │
# │ i64 ┆ i64 ┆ list[i64] │
# ╞═════╪═════╪═══════════╡
# │ 1   ┆ 4   ┆ [1, 2]    │
# │ 2   ┆ 4   ┆ [1, 2]    │
# │ 3   ┆ 6   ┆ [3]       │
# └─────┴─────┴───────────┘

@ritchie46
Copy link
Member

This is expected. Window functions map the list back to their position. Read mapping strategy

@ritchie46
Copy link
Member

See more context here: #6487

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or an improvement of an existing feature
Projects
None yet
Development

No branches or pull requests

3 participants