Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix: issue #9130 substitute redundant columns when doing cross join #9154

Merged
merged 9 commits into from
Feb 17, 2024

Conversation

Lordworms
Copy link
Contributor

@Lordworms Lordworms commented Feb 7, 2024

Which issue does this PR close?

Closes #9130

Rationale for this change

In cross-join, Since we have to validate the uniqueness of the name, we would throw an error when failed to do it. as what sqlite did, they just substituted the name of the redundant column(supposed to be a) into a:1, and we followed the strategy.

What changes are included in this PR?

Are these changes tested?

Are there any user-facing changes?

@github-actions github-actions bot added sql SQL Planner logical-expr Logical plan and expressions core Core DataFusion crate sqllogictest SQL Logic Tests (.slt) labels Feb 7, 2024
@Lordworms Lordworms marked this pull request as ready for review February 7, 2024 23:54
datafusion/expr/src/logical_plan/builder.rs Outdated Show resolved Hide resolved
datafusion/core/src/datasource/memory.rs Outdated Show resolved Hide resolved
datafusion/expr/src/logical_plan/builder.rs Outdated Show resolved Hide resolved
datafusion/expr/src/logical_plan/builder.rs Outdated Show resolved Hide resolved
@github-actions github-actions bot removed the core Core DataFusion crate label Feb 12, 2024
Copy link
Contributor

@alamb alamb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you @Lordworms for this contribution and I apologize for the delay in review

@jackwener any final thoughts on this PR? From my perspective it is ready to go.

I double checked with sqlite and indeed it does appear to append :1 to such column names 🤯

sqlite> create table t1 (a int, b int);
sqlite> create table t2 (a int, b int);
sqlite> create table t3 (a int, b int);
sqlite> insert into t1 values (1, 2);
sqlite> insert into t2 values (3, 4);
sqlite> insert into t3 values (5, 6);
sqlite> select * from (t1 cross join t2) as t cross join t3;
1|2|3|4|5|6
sqlite> .header on
sqlite> select * from (t1 cross join t2) as t cross join t3;
a|b|a:1|b:1|a|b
1|2|3|4|5|6
sqlite>

cc @lewiszlw as you filed the issue

# under the License.


# prepare the tables
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I ran this test without the code in this PR and it fails like this, so I think the test covers the changes in the PR

External error: query failed: DataFusion error: Error during planning: Projections require unique expression names but the expression "t.a" at position 0 and "t.a" at position 2 have the same name. Consider aliasing ("AS") one of them.
[SQL] select * from (t1 cross join t2) as t cross join t3;
-------
at test_files/./join.slt:733
at test_files/join_disable_repartition_joins.slt:26

Error: Execution("1 failures")
error: test failed, to rerun pass `-p datafusion-sqllogictest --test sqllogictests`

Caused by:
  process didn't exit successfully: `/Users/andrewlamb/Software/arrow-datafusion/target/debug/deps/sqllogictests-6e501f6e871dacfc joins` (exit status: 1)

let counter = name_map.entry(field.name().to_string()).or_insert(0);
*counter += 1;
if *counter > 1 {
let new_name = format!("{}:{}", field.name(), *counter - 1);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This effectively renames some of the columns to something else (like a:1) I think

How do we:

  1. know that name is unique? (couldn't there be another table in the query that has a column named a:1?
  2. know we should resolve the namespace conflict with the first column with the name?

let t2_field_1 = DFField::new_unqualified("a", DataType::Int32, false);
let t1_field_2 = DFField::new_unqualified("b", DataType::Int32, false);
let t2_field_2 = DFField::new_unqualified("b", DataType::Int32, false);

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would it be possible to add one more field named "a" here (to show a counter other than :1)?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes, I can do that

Comment on lines 1897 to 1899
let schema: Schema = DFSchema::new_with_metadata(fields, meta_data)
.unwrap()
.into();
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

new_with_metadata(fields, meta_data)? to avoid .unwrap()

Copy link
Contributor

@alamb alamb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks again @Lordworms

DFField::new_unqualified("a:1", DataType::Int32, false),
DFField::new_unqualified("b", DataType::Int32, false),
DFField::new_unqualified("b:1", DataType::Int32, false),
DFField::new_unqualified("a:2", DataType::Int32, false),
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍

@alamb alamb merged commit bb00b63 into apache:main Feb 17, 2024
23 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
logical-expr Logical plan and expressions sql SQL Planner sqllogictest SQL Logic Tests (.slt)
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Projections require unique expression names error
3 participants