Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

DataFrame.join(LazyFrame) is allowed ? #6264

Closed
1 of 2 tasks
gab23r opened this issue Jan 16, 2023 · 3 comments · Fixed by #6798
Closed
1 of 2 tasks

DataFrame.join(LazyFrame) is allowed ? #6264

gab23r opened this issue Jan 16, 2023 · 3 comments · Fixed by #6798

Comments

@gab23r
Copy link
Contributor

gab23r commented Jan 16, 2023

Research

  • I have searched the above polars tags on Stack Overflow for similar questions.

  • I have asked my usage related question on Stack Overflow.

Link to question on Stack Overflow

No response

Question about Polars

I think it is a new behavior but I am not sure.

I though you can't mix DataFrame and LazyFrame in join
Actually, you can if the LazyFrame is on the right side

Is it on purpose? Can we rely on this "feature" ?

import polars as pl
df = pl.DataFrame({'A': [1, 2, 3]})
ldf = pl.DataFrame({'A': [1, 2, 5]}).lazy()

_ = df.join(ldf, on='A')
# works

_ = ldf.join(df, on='A')
# ValueError: Expected a `LazyFrame` as join table, got <class 'polars.internals.dataframe.frame.DataFrame'>
@ritchie46
Copy link
Member

That surprises me. That should not be allowed.

@ghuls
Copy link
Collaborator

ghuls commented Jan 20, 2023

@ritchie46 It is not so surprising as join in eager converts both dataframes to lazy anyway.

return (
            self.lazy()
            .join(
                other=other.lazy(),
                left_on=left_on,
                right_on=right_on,
                on=on,
                how=how,
                suffix=suffix,
            )
            .collect(no_optimization=True)
        )

@alexander-beedie
Copy link
Collaborator

alexander-beedie commented Feb 11, 2023

My feeling is we should be explicit that the frame types are expected to match (as it might not be intentional), and raise a helpful error in all cases where they don't, telling the caller what's going on. Can make a simple PR for this, as it does seem inconsistent.

(If we were to allow it, and you actually intended to do a series of lazy computations, you may not realise you just promoted your call-chain to eager mode until you collect - or, as you now don't need to, you may forget to do that too and just eat a large performance penalty ;)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants