Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[SPARK-46949][SQL] Support CHAR/VARCHAR through ResolveDefaultColumns #44991

Closed
wants to merge 4 commits into from

Conversation

yaooqinn
Copy link
Member

@yaooqinn yaooqinn commented Feb 2, 2024

What changes were proposed in this pull request?

We have issues resolving column definitions with default values, i.e., c CHAR(5) DEFAULT 'foo', v VARCHAR(10) DEFAULT 'bar'. The reason is that CAHR/VARCHAR types in schemas are not supplanted with STRING type to align with the value expression.

This PR fixes these issues in ResolveDefaultColumns, which seems to only cover the v1 tables. When I applied some related tests to v2 tables, they had the same issues. But beyond that, there are some other front-loading needs to be addressed. In this case, I'd like to separate v2 from the v1 PR.

Why are the changes needed?

bugfix

spark-sql (default)> CREATE TABLE t( c CHAR(5) DEFAULT 'spark') USING parquet;
[INVALID_DEFAULT_VALUE.DATA_TYPE] Failed to execute CREATE TABLE command because the destination table column `c` has a DEFAULT value 'spark', which requires "CHAR(5)" type, but the statement provided a value of incompatible "STRING" type.

Does this PR introduce any user-facing change?

no, bugfix

How was this patch tested?

new tests

Was this patch authored or co-authored using generative AI tooling?

no

Copy link
Member

@dongjoon-hyun dongjoon-hyun left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1, LGTM (from my side)

@yaooqinn
Copy link
Member Author

yaooqinn commented Feb 2, 2024

Thanks you very much @dongjoon-hyun

|CREATE TABLE t(
| key int,
| v VARCHAR(6) DEFAULT 'apache',
| c CHAR(5) DEFAULT 'spark')
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

shall we do char padding for default values?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does padding only get involved with the read path?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For safety reasons, I added a new test at L245.

@yaooqinn
Copy link
Member Author

yaooqinn commented Feb 2, 2024

I created #44993 to resolve the test failure, separating it as it might get a chance for 3.5.

[info] - SPARK-46949: DDL with valid default char/varchar values *** FAILED *** (6 milliseconds)
[info]   org.apache.spark.sql.catalyst.analysis.TableAlreadyExistsException: [TABLE_OR_VIEW_ALREADY_EXISTS] Cannot create table or view `spark_catalog`.`default`.`t` because it already exists.
[info] Choose a different name, drop or replace the existing object, or add the IF NOT EXISTS clause to tolerate pre-existing objects. SQLSTATE: 42P07
[info]   at org.apache.spark.sql.errors.QueryCompilationErrors$.tableAlreadyExistsError(QueryCompilationErrors.scala:2610)

@09306677806

This comment was marked as spam.

@yaooqinn yaooqinn closed this in 362a4d4 Feb 2, 2024
@yaooqinn yaooqinn deleted the SPARK-46949 branch February 2, 2024 09:45
@yaooqinn
Copy link
Member Author

yaooqinn commented Feb 2, 2024

Thank you @dongjoon-hyun @HyukjinKwon @cloud-fan , merged to master

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants