Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support DictionaryString for Regex matching operators #12618

Closed
goldmedal opened this issue Sep 25, 2024 · 5 comments · Fixed by #12768
Closed

Support DictionaryString for Regex matching operators #12618

goldmedal opened this issue Sep 25, 2024 · 5 comments · Fixed by #12768
Assignees
Labels
enhancement New feature or request

Comments

@goldmedal
Copy link
Contributor

Is your feature request related to a problem or challenge?

While I was working on #12415, I found the DictionaryString can't pass the following case in datafusion/sqllogictest/test_files/string/string_query.slt.part

statement ok
create table test_basic_operator as
select
    arrow_cast(column1, 'Dictionary(Int32, Utf8)') as ascii_1,
    arrow_cast(column2, 'Dictionary(Int32, Utf8)') as ascii_2,
    arrow_cast(column3, 'Dictionary(Int32, Utf8)') as unicode_1,
    arrow_cast(column4, 'Dictionary(Int32, Utf8)') as unicode_2
from test_source;

query BB
SELECT
  ascii_1 ~* '^a.{3}e',
  unicode_1 ~* '^d.*Фу'
FROM test_basic_operator;
----
true false
false false
false true
NULL NULL

I got the error message:

External error: query failed: DataFusion error: Internal error: Data type Dictionary(Int32, Utf8) not supported for binary_string_array_flag_op_scalar operation 'regexp_is_match' on string array.
This was likely caused by a bug in DataFusion's code and we would welcome that you file an bug report in our issue tracker

Describe the solution you'd like

Support DictionaryString at

macro_rules! binary_string_array_flag_op {
($LEFT:expr, $RIGHT:expr, $OP:ident, $NOT:expr, $FLAG:expr) => {{
match $LEFT.data_type() {
DataType::Utf8View | DataType::Utf8 => {
compute_utf8_flag_op!($LEFT, $RIGHT, $OP, StringArray, $NOT, $FLAG)
},
DataType::LargeUtf8 => {
compute_utf8_flag_op!($LEFT, $RIGHT, $OP, LargeStringArray, $NOT, $FLAG)
},
other => internal_err!(
"Data type {:?} not supported for binary_string_array_flag_op operation '{}' on string array",

Describe alternatives you've considered

No response

Additional context

No response

@goldmedal goldmedal added the enhancement New feature or request label Sep 25, 2024
@alamb
Copy link
Contributor

alamb commented Sep 25, 2024

Thanks @goldmedal -- this is a great find

@goldmedal
Copy link
Contributor Author

Related TODO item:

@blaginin
Copy link
Contributor

blaginin commented Oct 2, 2024

I want to take those type issues if you don't mind, @goldmedal and Andrew. It feels like a nice way to get into in the project 😀

@blaginin
Copy link
Contributor

blaginin commented Oct 2, 2024

take

@alamb
Copy link
Contributor

alamb commented Oct 3, 2024

Thank you @blaginin

BTW here is an example that might help #12712

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants