Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add binary_view to string_view coercion #12643

Merged
merged 1 commit into from
Sep 28, 2024

Conversation

doupache
Copy link
Contributor

@doupache doupache commented Sep 27, 2024

Which issue does this PR close?

Closes #12500

Rationale for this change

a step forward to enable reading StringViewArray by default from Parquet

What changes are included in this PR?

add binary_view to string_view coercion

Are these changes tested?

yes , add a slt test case

I also test manually
change benchmarks/src/clickbench.rs , make force_view_types = true

config
    .options_mut()
    .execution
    .parquet
    .schema_force_view_types = true
cargo build 


cargo run --bin dfbench -- clickbench --iterations 1  \
--path data/hits_partitioned \
--queries-path queries/clickbench/queries.sql  \
-o ./clickbench_partitioned.json

all 43 query completed.

Are there any user-facing changes?

No, since force_view_type default is still false.

@github-actions github-actions bot added the sqllogictest SQL Logic Tests (.slt) label Sep 27, 2024
@doupache doupache changed the title add binary to string_view coercion Add binary to string_view coercion Sep 27, 2024
(LargeUtf8, Binary) => Some(LargeUtf8),
(LargeUtf8, LargeBinary) => Some(LargeUtf8),
(LargeUtf8, BinaryView) => Some(LargeUtf8),
Copy link
Contributor Author

@doupache doupache Sep 27, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

PR #12092
(LargeUtf8, BinaryView) => Some(Utf8View)

but i think we should coercion to LargeUtf8 ? , since LargeUtf8 it can be a huge string.
(LargeUtf8, BinaryView) => Some(LargeUtf8)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think Arrow actually has the equivalent of LargeUtf8View but arrow-rs doesn't support it yet.

I think the mapping (LargeUtf8, BinaryView) => Some(LargeUtf8) makes sense to me for now

@doupache
Copy link
Contributor Author

Without these change , hit_partitioned will failed at query 20

Q20: SELECT COUNT(*) FROM hits WHERE "URL" LIKE '%google%';
Error: Context("type_coercion", Plan("There isn't a common type to coerce BinaryView and Utf8 in LIKE expression"))

Copy link
Contributor

@alamb alamb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you @doupache -- this makes sense to me

(LargeUtf8, Binary) => Some(LargeUtf8),
(LargeUtf8, LargeBinary) => Some(LargeUtf8),
(LargeUtf8, BinaryView) => Some(LargeUtf8),
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think Arrow actually has the equivalent of LargeUtf8View but arrow-rs doesn't support it yet.

I think the mapping (LargeUtf8, BinaryView) => Some(LargeUtf8) makes sense to me for now

@doupache doupache changed the title Add binary to string_view coercion Add binary_view to string_view coercion Sep 27, 2024
@alamb alamb merged commit c21d025 into apache:main Sep 28, 2024
29 checks passed
@alamb
Copy link
Contributor

alamb commented Sep 28, 2024

Thanks again @doupache

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
sqllogictest SQL Logic Tests (.slt)
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Support Binary --> String coercion for StringView/BinaryView in LIKE
2 participants