Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[SPARK-20380][SQL] Unable to set/unset table comment property using ALTER TABLE SET/UNSET TBLPROPERTIES ddl #17649

Closed
wants to merge 1 commit into from

Conversation

sujith71955
Copy link
Contributor

@sujith71955 sujith71955 commented Apr 16, 2017

What changes were proposed in this pull request?

Table comment was not getting set/unset using ALTER TABLE SET/UNSET TBLPROPERTIES query
eg: ALTER TABLE table_with_comment SET TBLPROPERTIES("comment"= "modified comment)
when user alter the table properties and adds/updates table comment,table comment which is a field of CatalogTable instance is not getting updated and old table comment if exists was shown to user, inorder to handle this issue, update the comment field value in CatalogTable with the newly added/modified comment along with other table level properties when user executes ALTER TABLE SET TBLPROPERTIES query.

This pr has also taken care of unsetting the table comment when user executes query ALTER TABLE UNSET TBLPROPERTIES inorder to unset or remove table comment.
eg: ALTER TABLE table_comment UNSET TBLPROPERTIES IF EXISTS ('comment')

How was this patch tested?

Added test cases as part of SQLQueryTestSuite for verifying table comment using desc formatted table query after adding/modifying table comment as part of AlterTableSetPropertiesCommand and unsetting the table comment using AlterTableUnsetPropertiesCommand.

@sujith71955
Copy link
Contributor Author

cc @wzhfy

@wzhfy
Copy link
Contributor

wzhfy commented Apr 16, 2017

test this please

@wzhfy
Copy link
Contributor

wzhfy commented Apr 16, 2017

also cc @gatorsmile

@wzhfy
Copy link
Contributor

wzhfy commented Apr 16, 2017

@sujith71955 Can you change the title to be more specific, like "Support altering table comment"?

val newTable = table.copy(properties = table.properties ++ properties)
val newTable = table.copy(
properties = table.properties ++ properties,
comment = properties.get("comment"))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

add some comment to explain why this is necessary?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I will add a comment

@gatorsmile
Copy link
Member

ok to test

@gatorsmile
Copy link
Member

This is a separate issue. Could you open a new JIRA and use a different JIRA number?

DESC formatted table_comment;

-- DROP TEST TABLE
DROP TABLE table_comment;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The basic function has been tested in the other suites. How about changing this test suite to describe-table-after-alter-table?

CREATE TABLE table_comment (a STRING, b INT)

-- ALTER TABLE BY ADDING COMMENT
ALTER TABLE table_comment set tblproperties(comment = "added comment");

DESC formatted table_comment;

-- ALTER TABLE BY MODIFYING COMMENT
ALTER TABLE table_comment set tblproperties(comment = "modified comment");

DESC formatted table_comment;

-- DROP TEST TABLE
DROP TABLE table_comment;

Copy link
Member

@gatorsmile gatorsmile Apr 16, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It sounds we also have a bug in the ALTER TABLE UNSET.

Please also add another test case

-- ALTER TABLE BY UNSETTING COMMENT
ALTER TABLE table_comment unset tblproperties('comment');

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

sure, i will add a new jira for this problem and i will update the test suite name as per the suggestion, as you suggested i will also verify ALTER TABLE UNSET .

@SparkQA
Copy link

SparkQA commented Apr 16, 2017

Test build #75840 has finished for PR 17649 at commit 1db3103.

  • This patch fails Spark unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@sujith71955 sujith71955 changed the title [SPARK-20023][SQL][follow up] Output table comment for DESC FORMATTED after adding/modifying table comment using Alter TableSetPropertiesCommand [SPARK-20380][SQL] Output table comment for DESC FORMATTED after adding/modifying table comment using Alter TableSetPropertiesCommand Apr 19, 2017
@gatorsmile
Copy link
Member

gatorsmile commented Apr 20, 2017

Any update? ping @sujith71955 @wzhfy

@sujith71955 sujith71955 force-pushed the alter_table_comment branch 2 times, most recently from 467a694 to 19d80c6 Compare April 20, 2017 18:48
@sujith71955
Copy link
Contributor Author

@gatorsmile @wzhfy updated the PR based on the review comments, please let me know for any suggestions.

@SparkQA
Copy link

SparkQA commented Apr 20, 2017

Test build #76002 has finished for PR 17649 at commit 467a694.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Apr 20, 2017

Test build #76004 has finished for PR 17649 at commit 19d80c6.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

Type MANAGED
Provider parquet
Comment modified comment
Properties [comment=modified comment]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Comment string i.e. "modified comment" is in both Properties and Comment?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should remove comment from Properties

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes, i saw for hive tables already its been taken care in HiveClientImpl.scala class where "comment" is getting filtered from table properties, also for parquet table same has to be taken care.

DESC formatted table_with_comment;

-- ALTER TABLE BY MODIFYING COMMENT
ALTER TABLE table_with_comment set tblproperties(comment = "modified comment");
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it would be better to also have other properties with comment in this test.

Copy link
Contributor

@wzhfy wzhfy Apr 21, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

rename describe-table-after-alter-table.sql to alter-table-comment.sql?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nvm I missed your previous discussion

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

modified the testcase as per comment, added one more proeprty while altering the table.

@wzhfy
Copy link
Contributor

wzhfy commented Apr 21, 2017

@gatorsmile Should we care about case sensitivity of comment? Hive allows both "comment" and "Comment" in table parameters, while we are extracting comment as a field of CatalogTable.

None
} else {
table.properties.get("comment")
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit:

    val comment = if (propKeys.contains("comment")) None else table.properties.get("comment")

@gatorsmile
Copy link
Member

@wzhfy Could you check the behavior of Hive?

@SparkQA
Copy link

SparkQA commented Apr 23, 2017

Test build #76085 has finished for PR 17649 at commit 50deed9.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@wzhfy
Copy link
Contributor

wzhfy commented Apr 24, 2017

@gatorsmile Hive treats comment simply as a key in the string-string parameter map, while spark extracts comment from the map as a field in CatalogTable. So the question is, should spark consider both comment and COMMENT as table comment? If yes, what if they are different?

Here's the results of hive:

0: jdbc:hive2://.../> create table src (key int , value string) comment "initial comment";
No rows affected (1.055 seconds)
0: jdbc:hive2://.../> desc formatted src;
+-------------------------------+-------------------------------------------------------+-----------------------+--+
|           col_name            |                       data_type                       |        comment        |
+-------------------------------+-------------------------------------------------------+-----------------------+--+
| # col_name                    | data_type                                             | comment               |
|                               | NULL                                                  | NULL                  |
| key                           | int                                                   |                       |
| value                         | string                                                |                       |
|                               | NULL                                                  | NULL                  |
| # Detailed Table Information  | NULL                                                  | NULL                  |
| Database:                     | wzh                                                   | NULL                  |
| Owner:                        | spark                                                 | NULL                  |
| CreateTime:                   | Mon Apr 24 11:43:40 CST 2017                          | NULL                  |
| LastAccessTime:               | UNKNOWN                                               | NULL                  |
| Protect Mode:                 | None                                                  | NULL                  |
| Retention:                    | 0                                                     | NULL                  |
| Location:                     | hdfs://hacluster/user/hive/warehouse/wzh.db/src       | NULL                  |
| Table Type:                   | MANAGED_TABLE                                         | NULL                  |
| Table Parameters:             | NULL                                                  | NULL                  |
|                               | comment                                               | initial comment       |
|                               | transient_lastDdlTime                                 | 1493005420            |
|                               | NULL                                                  | NULL                  |
| # Storage Information         | NULL                                                  | NULL                  |
| SerDe Library:                | org.apache.hadoop.hive.serde2.columnar.ColumnarSerDe  | NULL                  |
| InputFormat:                  | org.apache.hadoop.hive.ql.io.RCFileInputFormat        | NULL                  |
| OutputFormat:                 | org.apache.hadoop.hive.ql.io.RCFileOutputFormat       | NULL                  |
| Compressed:                   | No                                                    | NULL                  |
| Num Buckets:                  | -1                                                    | NULL                  |
| Bucket Columns:               | []                                                    | NULL                  |
| Sort Columns:                 | []                                                    | NULL                  |
| Storage Desc Params:          | NULL                                                  | NULL                  |
|                               | serialization.format                                  | 1                     |
+-------------------------------+-------------------------------------------------------+-----------------------+--+
28 rows selected (0.525 seconds)
0: jdbc:hive2://.../> alter table src set tblproperties("comment"="new comment", "COMMENT"="NEW COMMENT");
No rows affected (0.62 seconds)
0: jdbc:hive2://.../> desc formatted src;
+-------------------------------+-------------------------------------------------------+-----------------------+--+
|           col_name            |                       data_type                       |        comment        |
+-------------------------------+-------------------------------------------------------+-----------------------+--+
| # col_name                    | data_type                                             | comment               |
|                               | NULL                                                  | NULL                  |
| key                           | int                                                   |                       |
| value                         | string                                                |                       |
|                               | NULL                                                  | NULL                  |
| # Detailed Table Information  | NULL                                                  | NULL                  |
| Database:                     | wzh                                                   | NULL                  |
| Owner:                        | spark                                                 | NULL                  |
| CreateTime:                   | Mon Apr 24 11:43:40 CST 2017                          | NULL                  |
| LastAccessTime:               | UNKNOWN                                               | NULL                  |
| Protect Mode:                 | None                                                  | NULL                  |
| Retention:                    | 0                                                     | NULL                  |
| Location:                     | hdfs://hacluster/user/hive/warehouse/wzh.db/src       | NULL                  |
| Table Type:                   | MANAGED_TABLE                                         | NULL                  |
| Table Parameters:             | NULL                                                  | NULL                  |
|                               | COLUMN_STATS_ACCURATE                                 | false                 |
|                               | COMMENT                                               | NEW COMMENT           |
|                               | comment                                               | new comment           |
|                               | last_modified_by                                      | spark                 |
|                               | last_modified_time                                    | 1493005506            |
|                               | numFiles                                              | 0                     |
|                               | numRows                                               | -1                    |
|                               | rawDataSize                                           | -1                    |
|                               | totalSize                                             | 0                     |
|                               | transient_lastDdlTime                                 | 1493005506            |
|                               | NULL                                                  | NULL                  |
| # Storage Information         | NULL                                                  | NULL                  |
| SerDe Library:                | org.apache.hadoop.hive.serde2.columnar.ColumnarSerDe  | NULL                  |
| InputFormat:                  | org.apache.hadoop.hive.ql.io.RCFileInputFormat        | NULL                  |
| OutputFormat:                 | org.apache.hadoop.hive.ql.io.RCFileOutputFormat       | NULL                  |
| Compressed:                   | No                                                    | NULL                  |
| Num Buckets:                  | -1                                                    | NULL                  |
| Bucket Columns:               | []                                                    | NULL                  |
| Sort Columns:                 | []                                                    | NULL                  |
| Storage Desc Params:          | NULL                                                  | NULL                  |
|                               | serialization.format                                  | 1                     |
+-------------------------------+-------------------------------------------------------+-----------------------+--+
36 rows selected (0.337 seconds)
0: jdbc:hive2://.../> alter table src unset tblproperties ("COMMENT");
No rows affected (0.646 seconds)
0: jdbc:hive2://.../> desc formatted src;
+-------------------------------+-------------------------------------------------------+-----------------------+--+
|           col_name            |                       data_type                       |        comment        |
+-------------------------------+-------------------------------------------------------+-----------------------+--+
| # col_name                    | data_type                                             | comment               |
|                               | NULL                                                  | NULL                  |
| key                           | int                                                   |                       |
| value                         | string                                                |                       |
|                               | NULL                                                  | NULL                  |
| # Detailed Table Information  | NULL                                                  | NULL                  |
| Database:                     | wzh                                                   | NULL                  |
| Owner:                        | spark                                                 | NULL                  |
| CreateTime:                   | Mon Apr 24 11:43:40 CST 2017                          | NULL                  |
| LastAccessTime:               | UNKNOWN                                               | NULL                  |
| Protect Mode:                 | None                                                  | NULL                  |
| Retention:                    | 0                                                     | NULL                  |
| Location:                     | hdfs://hacluster/user/hive/warehouse/wzh.db/src       | NULL                  |
| Table Type:                   | MANAGED_TABLE                                         | NULL                  |
| Table Parameters:             | NULL                                                  | NULL                  |
|                               | COLUMN_STATS_ACCURATE                                 | false                 |
|                               | comment                                               | new comment           |
|                               | last_modified_by                                      | spark                 |
|                               | last_modified_time                                    | 1493005605            |
|                               | numFiles                                              | 0                     |
|                               | numRows                                               | -1                    |
|                               | rawDataSize                                           | -1                    |
|                               | totalSize                                             | 0                     |
|                               | transient_lastDdlTime                                 | 1493005605            |
|                               | NULL                                                  | NULL                  |
| # Storage Information         | NULL                                                  | NULL                  |
| SerDe Library:                | org.apache.hadoop.hive.serde2.columnar.ColumnarSerDe  | NULL                  |
| InputFormat:                  | org.apache.hadoop.hive.ql.io.RCFileInputFormat        | NULL                  |
| OutputFormat:                 | org.apache.hadoop.hive.ql.io.RCFileOutputFormat       | NULL                  |
| Compressed:                   | No                                                    | NULL                  |
| Num Buckets:                  | -1                                                    | NULL                  |
| Bucket Columns:               | []                                                    | NULL                  |
| Sort Columns:                 | []                                                    | NULL                  |
| Storage Desc Params:          | NULL                                                  | NULL                  |
|                               | serialization.format                                  | 1                     |
+-------------------------------+-------------------------------------------------------+-----------------------+--+
35 rows selected (0.552 seconds)

@SparkQA
Copy link

SparkQA commented Apr 25, 2017

Test build #76139 has finished for PR 17649 at commit 1156483.

  • This patch fails Scala style tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Apr 25, 2017

Test build #76142 has finished for PR 17649 at commit 99ba5be.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@sujith71955
Copy link
Contributor Author

@gatorsmile @wzhfy updated the PR by removing the 'comment' from table properties .

@wzhfy
Copy link
Contributor

wzhfy commented Apr 26, 2017

The changes look good to me if we don't care about the case sensitivity issue.

@gatorsmile
Copy link
Member

@wzhfy In HiveClientImpl.scala, we follow Hive and consider the case sensitivity of property key comment.

@sujith71955 Could you resolve the following comment?
#17649 (comment)

@@ -267,8 +271,11 @@ case class AlterTableUnsetPropertiesCommand(
}
}
}
// if 'comment' key is present in the seq of keys which needs to be unset then reset the table
// level comment with none.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: How about?

If comment is in the table property, we reset it to None

@@ -222,7 +222,6 @@ private[spark] class HiveExternalCatalog(conf: SparkConf, hadoopConf: Configurat
} else {
tableDefinition.storage.locationUri
}

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please revert it back.

@@ -0,0 +1,29 @@
CREATE TABLE table_with_comment (a STRING, b INT, c STRING, d STRING) USING parquet COMMENT 'added';

DESC formatted table_with_comment;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: formatted -> FORMATTED

-- ALTER TABLE BY MODIFYING COMMENT
ALTER TABLE table_with_comment set tblproperties("comment"= "modified comment", "type"= "parquet");

DESC formatted table_with_comment;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: formatted -> FORMATTED

-- ALTER TABLE BY ADDING COMMENT
ALTER TABLE table_comment set tblproperties(comment = "added comment");

DESC formatted table_comment;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: formatted -> FORMATTED

-- ALTER UNSET PROPERTIES COMMENT
ALTER TABLE table_comment UNSET TBLPROPERTIES IF EXISTS ('comment');

DESC formatted table_comment;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: formatted -> FORMATTED

DESC formatted table_comment;

-- ALTER TABLE BY ADDING COMMENT
ALTER TABLE table_comment set tblproperties(comment = "added comment");
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: set tblproperties -> SET TBLPROPERTIES

DESC formatted table_with_comment;

-- ALTER TABLE BY MODIFYING COMMENT
ALTER TABLE table_with_comment set tblproperties("comment"= "modified comment", "type"= "parquet");
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: set tblproperties -> SET TBLPROPERTIES

-- CREATE TABLE WITHOUT COMMENT
CREATE TABLE table_comment (a STRING, b INT) USING parquet;

DESC formatted table_comment;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: formatted -> FORMATTED

@gatorsmile
Copy link
Member

LGTM except a few minor comments.

…ding/modifying table comment using Alter TableSetPropertiesCommand and UNSET TBLPROPERTIES query.

### What changes were proposed in this pull request?

This PR is a follow up PR to JIRA issue 20023 where table comment was not displayed in describe formatted command after user creates a table with table comment.
But still was issue  when user alter the table properties and adds/updates table comment. table comment which is directly part of CatalogTable instance where not getting updated and
old table comment was shown, to handle this issue  while updating the table properties map with newly added/modified properties in CatalogTable instance also
update the comment parameter with newly added/modified comment as part of AlterTableSetPropertiesCommand which is  also now exist in CatalogTable instance level  not only in table
properties map of CatalogTable instance. this pr has also taken care of unsetting the table comment when user executes AlterTableUnsetPropertiesCommand.

### How was this patch tested?
Added test cases  as part of SQLQueryTestSuite for verifying  table comment using desc formatted table query after adding or modifying table comment as part of AlterTableSetPropertiesCommand and AlterTableUnsetPropertiesCommand.
@SparkQA
Copy link

SparkQA commented May 5, 2017

Test build #76496 has finished for PR 17649 at commit 4f02ada.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@sujith71955
Copy link
Contributor Author

@wzhfy @gatorsmile fixed all the comments, thanks for reviewing the changes and providing me valuable sharings and comments. thanks.

@gatorsmile
Copy link
Member

gatorsmile commented May 5, 2017

@sujith71955 Could you update the PR description and title?

This PR is to fix the issue in modify table comments using ALTER TABLE SET/UNSET TBLPROPERTIES, instead of fixing the DESC FORMATTED, right?

@sujith71955 sujith71955 changed the title [SPARK-20380][SQL] Output table comment for DESC FORMATTED after adding/modifying table comment using Alter TableSetPropertiesCommand [SPARK-20380][SQL] Unable to modify table comment property using ALTER TABLE SET/UNSET TBLPROPERTIES May 6, 2017
@sujith71955 sujith71955 changed the title [SPARK-20380][SQL] Unable to modify table comment property using ALTER TABLE SET/UNSET TBLPROPERTIES [SPARK-20380][SQL] Unable to set/unset table comment property using ALTER TABLE SET/UNSET TBLPROPERTIES May 6, 2017
@sujith71955 sujith71955 changed the title [SPARK-20380][SQL] Unable to set/unset table comment property using ALTER TABLE SET/UNSET TBLPROPERTIES [SPARK-20380][SQL] Unable to set/unset table comment property using ALTER TABLE SET/UNSET TBLPROPERTIES ddl May 6, 2017
@sujith71955
Copy link
Contributor Author

@gatorsmile you are right, pr should address the issue which is handled in alter table set/unset properties ddls, updated the description and title, let me know for any clarifications. thanks.

@gatorsmile
Copy link
Member

retest this please

@SparkQA
Copy link

SparkQA commented May 7, 2017

Test build #76551 has finished for PR 17649 at commit 4f02ada.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@gatorsmile
Copy link
Member

Thanks! Merging to master.

@asfgit asfgit closed this in 42cc6d1 May 8, 2017
liyichao pushed a commit to liyichao/spark that referenced this pull request May 24, 2017
…LTER TABLE SET/UNSET TBLPROPERTIES ddl

### What changes were proposed in this pull request?
Table comment was not getting  set/unset using **ALTER TABLE  SET/UNSET TBLPROPERTIES** query
eg: ALTER TABLE table_with_comment SET TBLPROPERTIES("comment"= "modified comment)
 when user alter the table properties  and adds/updates table comment,table comment which is a field  of **CatalogTable**  instance is not getting updated and  old table comment if exists was shown to user, inorder  to handle this issue, update the comment field value in **CatalogTable** with the newly added/modified comment along with other table level properties when user executes **ALTER TABLE  SET TBLPROPERTIES** query.

This pr has also taken care of unsetting the table comment when user executes query  **ALTER TABLE  UNSET TBLPROPERTIES** inorder to unset or remove table comment.
eg: ALTER TABLE table_comment UNSET TBLPROPERTIES IF EXISTS ('comment')

### How was this patch tested?
Added test cases  as part of **SQLQueryTestSuite** for verifying  table comment using desc formatted table query after adding/modifying table comment as part of **AlterTableSetPropertiesCommand** and unsetting the table comment using **AlterTableUnsetPropertiesCommand**.

Author: sujith71955 <sujithchacko.2010@gmail.com>

Closes apache#17649 from sujith71955/alter_table_comment.
asfgit pushed a commit that referenced this pull request Jun 16, 2017
## What changes were proposed in this pull request?
Test failed in `describe.sql`.

We need to fix the related bug introduced in (#17649) in the follow-up PR to master.

## How was this patch tested?
N/A

Author: gatorsmile <gatorsmile@gmail.com>

Closes #18316 from gatorsmile/fix.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants