Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix: add ArrayAndElementAndOptionalIndex for proper casting in array_position #9233

Merged
merged 9 commits into from
Feb 19, 2024

Conversation

tshauck
Copy link
Contributor

@tshauck tshauck commented Feb 15, 2024

Which issue does this PR close?

Closes #9093

Rationale for this change

Currently throws an error when doing SELECT array_position(arrow_cast([1, 2, 3, 4, 5], 'List(Int32)'), 5), and after this PR it won't.

What changes are included in this PR?

Change recommended in #9093 (comment)

Are these changes tested?

Added an additional sql logic test that failed w/o the change, and doesn't fail w/ the change.

Are there any user-facing changes?

No

@github-actions github-actions bot added logical-expr Logical plan and expressions sqllogictest SQL Logic Tests (.slt) labels Feb 15, 2024
@tshauck
Copy link
Contributor Author

tshauck commented Feb 15, 2024

Ah, actually seeing that won't work in the case of the optional 3rd arg for array position. Not sure if there's something built-in, but will look around. Or maybe add a ArrayAndElementPlusArgs? 🤔

@jayzhan211
Copy link
Contributor

Ah, actually seeing that won't work in the case of the optional 3rd arg for array position. Not sure if there's something built-in, but will look around. Or maybe add a ArrayAndElementPlusArgs? 🤔

No built-in signature, maybe we need ArrayAndElementAndOptionalIndex 😢

@tshauck
Copy link
Contributor Author

tshauck commented Feb 15, 2024

That'd definitely be a better name... I'll let this PR sit for a couple of days in case there's a better idea or compelling reason not to do this in the first place, then'll see what ArrayAndElementAndOptionalIndex looks like.

@tshauck tshauck marked this pull request as ready for review February 16, 2024 01:51
@tshauck
Copy link
Contributor Author

tshauck commented Feb 16, 2024

I think this is ready for a review. It adds ArrayAndElementAndOptionalIndex and associated code -- thanks!

@tshauck tshauck changed the title fix: use array_and_element for proper casting in array_position fix: add ArrayAndElementAndOptionalIndex for proper casting in array_position Feb 16, 2024
let first_two_types = &current_types[0..2];
let valid_types = array_append_or_prepend_valid_types(first_two_types, true)?;

let valid_types_with_index = valid_types
Copy link
Contributor

@jayzhan211 jayzhan211 Feb 16, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We can optionally append valid_types_with_index if the length is 3, otherwise, return here.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks! Addressed in 5086f31.

@@ -130,6 +130,31 @@ fn get_valid_types(
_ => Ok(vec![vec![]]),
}
}
fn array_append_and_optional_index(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

array_element_and_optional_index seems a better name

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also in 5086f31

Copy link
Contributor

@alamb alamb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @tshauck and @jayzhan211 -- this PR is looking very close

@@ -2603,6 +2603,16 @@ select array_position(arrow_cast(make_array([1, 2, 3], [4, 5, 6], [5, 5, 5], [4,
----
2 2

query I
SELECT array_position(arrow_cast([5, 2, 3, 4, 5], 'List(Int32)'), 5)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For the record, this test fails on main like this:

Error during planning: array_position received incompatible types: '[Int32, Int64]'.

1

query I
SELECT array_position(arrow_cast([5, 2, 3, 4, 5], 'List(Int32)'), 5, 2)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we please add a few more tests:

  1. For LargeList
  2. Error cases (e.g. for types that are still not compatible like array_position(arrow_cast([5, 2, 3, 4, 5], 'List(Int32)'), 'foo')

Copy link
Contributor Author

@tshauck tshauck Feb 16, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the feedback... LargeList is added and works as expected. For array_position(arrow_cast([5, 2, 3, 4, 5], 'List(Int32)'), 'foo'), this actually doesn't fail right now on this branch and returns NULL:

❯ SELECT array_position(arrow_cast([5, 2, 3, 4, 5], 'List(Int32)'), 'foo') IS NULL;
+----------------------------------------------------------------------------------------------+
| array_position(make_array(Int64(5),Int64(2),Int64(3),Int64(4),Int64(5)),Utf8("foo")) IS NULL |
+----------------------------------------------------------------------------------------------+
| true                                                                                         |
+----------------------------------------------------------------------------------------------+
1 row in set. Query took 0.004 seconds.

Because this PR is leveraging the append's casting functionality, this seems to be the current behavior on main for append:

arrow-datafusion/datafusion-cli main ➜ ./target/debug/datafusion-cli 
DataFusion CLI v35.0.0
❯ SELECT array_append(arrow_cast([5, 2, 3, 4, 5], 'List(Int32)'), 'foo');
+------------------------------------------------------------------------------------+
| array_append(make_array(Int64(5),Int64(2),Int64(3),Int64(4),Int64(5)),Utf8("foo")) |
+------------------------------------------------------------------------------------+
| [5, 2, 3, 4, 5, foo]                                                               |
+------------------------------------------------------------------------------------+
1 row in set. Query took 0.028 seconds.

I guess as I user I'd think if the array_append case would work, I'd also expect array_position to work like that. FWIW duckdb works for both array_append and array_position([1,2,3], 'foo') it just returns 0 rather than NULL.

Anyways, happy to go either way, just thought I'd bring it up before delving further.

Copy link
Contributor

@jayzhan211 jayzhan211 Feb 17, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For the second case, we return NULL which is compatible with Postgres behavior, so I think it is acceptable. You can just add the test case to make sure we got NULL

query I
select array_position(arrow_cast([5, 2, 3, 4, 5], 'List(Int32)'), 'foo')
----
NULL

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sounds good... added in 4ec6289

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for adding the tests and doing the background research

Copy link
Contributor

@jayzhan211 jayzhan211 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

Copy link
Contributor

@alamb alamb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @tshauck and @jayzhan211 for the reviews 🙏

1

query I
SELECT array_position(arrow_cast([5, 2, 3, 4, 5], 'List(Int32)'), 5, 2)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for adding the tests and doing the background research

@alamb alamb merged commit 60ee91e into apache:main Feb 19, 2024
23 checks passed
@tshauck tshauck deleted the update-array-position branch February 19, 2024 13:04
@tshauck
Copy link
Contributor Author

tshauck commented Feb 19, 2024

Great, thank you both!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
logical-expr Logical plan and expressions sqllogictest SQL Logic Tests (.slt)
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Potentially Overly-Stringent Type Checking for array_position
3 participants