Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

move make_array array_append array_prepend array_concat function to datafusion-functions-array crate #9504

Merged
merged 10 commits into from
Mar 10, 2024

Conversation

guojidan
Copy link
Contributor

@guojidan guojidan commented Mar 8, 2024

Which issue does this PR close?

Closes #9322.

Rationale for this change

What changes are included in this PR?

move make_array array_append array_prepend array_concat function to datafusion-functions-array crate

Are these changes tested?

Are there any user-facing changes?

@github-actions github-actions bot added sql SQL Planner logical-expr Logical plan and expressions physical-expr Physical Expressions optimizer Optimizer rules labels Mar 8, 2024
@guojidan
Copy link
Contributor Author

guojidan commented Mar 8, 2024

because #9343 pr lasting for too long,difficult to rebase or merge, so I open a new pr

Copy link
Contributor

@jayzhan211 jayzhan211 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I take a quick glance and I think it is ready to merge.

There are 2 things that I'm trying to figure out to improve further

  1. Make sure we have Scalar that we need make_scalar_function_with_hints, since I think we should only get Array for array functions, if we really have Scalar, it seems something wrong to me.
  2. Simplify cfg feature flag, I would like to change OperatorToFunction to OperatorToArrayFunciton and see whether we can import the entire rule (file) optionally.

They don't need to be solved in this PR, but I think they are possible to be improved

@@ -44,6 +45,7 @@ async-trait = { workspace = true }
chrono = { workspace = true }
datafusion-common = { workspace = true, default-features = true }
datafusion-expr = { workspace = true }
datafusion-functions-array = { workspace = true, optional = true }
Copy link
Contributor

@jayzhan211 jayzhan211 Mar 8, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@alamb I think having optional importing is fine. Is there any other downside to this?

@jayzhan211
Copy link
Contributor

let me rebase and fix the conflicts

Signed-off-by: jayzhan211 <jayzhan211@gmail.com>
@jayzhan211
Copy link
Contributor

  1. Make sure we have Scalar that we need make_scalar_function_with_hints, since I think we should only get Array for array functions, if we really have Scalar, it seems something wrong to me.

I think we need make_scalar_function_with_hints, Scalar::List are represented in ColumnValue::Scalar, and others are represented in ColumnValue::Array. Although we don't do the optimized Scalar path for array functions, to be consistent with the definition of ColumnValue, it may be nice be convert it to ColumnValue::Scalar if we got Scalar::List.

Copy link
Member

@Weijun-H Weijun-H left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. thanks @guojidan

Comment on lines 404 to 406
array element, // arg name
"appends an element to the end of an array.", // doc
array_append_udf // internal function name
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
array element, // arg name
"appends an element to the end of an array.", // doc
array_append_udf // internal function name
array element,
"appends an element to the end of an array.",
array_append_udf

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I believe that we can get rid of these comments since the structure is apparent.

Comment on lines 458 to 460
element array, // arg name
"Prepends an element to the beginning of an array.", // doc
array_prepend_udf // internal function name
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
element array, // arg name
"Prepends an element to the beginning of an array.", // doc
array_prepend_udf // internal function name
element array,
"Prepends an element to the beginning of an array.",
array_prepend_udf

Comment on lines 512 to 513
"Concatenates arrays.", // doc
array_concat_udf // internal function name
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
"Concatenates arrays.", // doc
array_concat_udf // internal function name
"Concatenates arrays.",
array_concat_udf

Comment on lines 590 to 591
"Returns an Arrow array using the specified input expressions.", // doc
make_array_udf // internal function name
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
"Returns an Arrow array using the specified input expressions.", // doc
make_array_udf // internal function name
"Returns an Arrow array using the specified input expressions.",
make_array_udf

@jayzhan211
Copy link
Contributor

jayzhan211 commented Mar 9, 2024

make_scalar_function_with_hints

but maybe we should call another name, it is not the same as make_scalar_function_with_hints in physical expr.

@jayzhan211
Copy link
Contributor

@Weijun-H I think we can merge this and improve others on follow up PR

@jayzhan211
Copy link
Contributor

but I have no idea why the test is running endless

Signed-off-by: jayzhan211 <jayzhan211@gmail.com>
@jayzhan211
Copy link
Contributor

I want to split them to different files to avoid crazy conflict

Signed-off-by: jayzhan211 <jayzhan211@gmail.com>
Signed-off-by: jayzhan211 <jayzhan211@gmail.com>
@jayzhan211
Copy link
Contributor

jayzhan211 commented Mar 9, 2024

Upd2: I found out the reason and solution.
Upd: select array_ndims([[[[[[[[[[[[[[[[[[[[[1]]]]]]]]]]]]]]]]]]]]]); is running super slow

It seems super slow to run array.slt with this PR

2031259 has been canceled after 6hr, so the issue might be the change in this PR.

Signed-off-by: jayzhan211 <jayzhan211@gmail.com>
Signed-off-by: jayzhan211 <jayzhan211@gmail.com>
@jayzhan211
Copy link
Contributor

@guojidan Besides fixing conflicts. I split those functions to their own files, and rename to make_scalar_funciton (there should be a better name).

let me merge this PR first, and file another PR for any other changes.

Thanks @guojidan and @Weijun-H

@jayzhan211 jayzhan211 merged commit 88187d4 into apache:main Mar 10, 2024
24 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
logical-expr Logical plan and expressions optimizer Optimizer rules physical-expr Physical Expressions sql SQL Planner
Projects
None yet
Development

Successfully merging this pull request may close these issues.

move make_array array_append array_prepend array_concat function to datafusion-functions-array crate
3 participants