merging master #11

ahshahid · 2020-12-09T19:47:41Z

What changes were proposed in this pull request?

Why are the changes needed?

Does this PR introduce any user-facing change?

How was this patch tested?

…AR types ### What changes were proposed in this pull request? Add migration guide for CHAR VARCHAR types ### Why are the changes needed? for migration ### Does this PR introduce _any_ user-facing change? doc change ### How was this patch tested? passing ci Closes #30654 from yaooqinn/SPARK-33641-F. Authored-by: Kent Yao <yaooqinn@hotmail.com> Signed-off-by: Wenchen Fan <wenchen@databricks.com>

… when sc.stop in yarn client mode ### What changes were proposed in this pull request? This change make InterruptedIOException to be treated as InterruptedException when closing YarnClientSchedulerBackend, which doesn't log error with "YARN application has exited unexpectedly xxx" ### Why are the changes needed? For YarnClient mode, when stopping YarnClientSchedulerBackend, it first tries to interrupt Yarn application monitor thread. In MonitorThread.run() it catches InterruptedException to gracefully response to stopping request. But client.monitorApplication method also throws InterruptedIOException when the hadoop rpc call is calling. In this case, MonitorThread will not know it is interrupted, a Yarn App failed is returned with "Failed to contact YARN for application xxxxx; YARN application has exited unexpectedly with state xxxxx" is logged with error level. which confuse user a lot. ### Does this PR introduce _any_ user-facing change? Yes ### How was this patch tested? very simple patch, seems no need? Closes #30617 from sqlwindspeaker/yarn-client-interrupt-monitor. Authored-by: suqilong <suqilong@qiyi.com> Signed-off-by: Mridul Muralidharan <mridul<at>gmail.com>

### What changes were proposed in this pull request? Currently, when a client requests FETCH_PRIOR to Thriftserver, Thriftserver reiterates from the start position. Because Thriftserver caches a query result with an array when THRIFTSERVER_INCREMENTAL_COLLECT feature is off, FETCH_PRIOR can be implemented without reiterating the result. A trait FeatureIterator is added in order to separate the implementation for iterator and an array. Also, FeatureIterator supports moves cursor with absolute position, which will be useful for the implementation of FETCH_RELATIVE, FETCH_ABSOLUTE. ### Why are the changes needed? For better performance of Thriftserver. ### Does this PR introduce _any_ user-facing change? No ### How was this patch tested? FetchIteratorSuite Closes #30600 from Dooyoung-Hwang/refactor_with_fetch_iterator. Authored-by: Dooyoung Hwang <dooyoung.hwang@sk.com> Signed-off-by: HyukjinKwon <gurwls223@apache.org>

…e doc of ANSI Compliance ### What changes were proposed in this pull request? Add make_date/make_timestamp/make_interval into the doc of ANSI Compliance ### Why are the changes needed? Users can know that these functions throw runtime exceptions under ANSI mode if the result is not valid. ### Does this PR introduce _any_ user-facing change? No ### How was this patch tested? Build doc and check it in browser: ![image](https://user-images.githubusercontent.com/1097932/101608930-34a79e80-39bb-11eb-9294-9d9b8c3f6faa.png) Closes #30683 from gengliangwang/improveDoc. Authored-by: Gengliang Wang <gengliang.wang@databricks.com> Signed-off-by: HyukjinKwon <gurwls223@apache.org>

…to nonInheritableMetadataKeys in Alias ### What changes were proposed in this pull request? This PR is a followup of #30488. This PR proposes to rename `Alias.deniedMetadataKeys` to `Alias.nonInheritableMetadataKeys` to make it less confusing. ### Why are the changes needed? To make it easier to maintain and read. ### Does this PR introduce _any_ user-facing change? No. This is rather a code cleanup. ### How was this patch tested? Ran the unittests written in the previous PR manually. Jenkins and GitHub Actions in this PR should also test them. Closes #30682 from HyukjinKwon/SPARK-33071-SPARK-33536. Authored-by: HyukjinKwon <gurwls223@apache.org> Signed-off-by: HyukjinKwon <gurwls223@apache.org>

### What changes were proposed in this pull request? This PR adds `DeleteFromTable` to supported plans in `ReplaceNullWithFalseInPredicate`. ### Why are the changes needed? This change allows Spark to optimize delete conditions like we optimize filters. ### Does this PR introduce _any_ user-facing change? No. ### How was this patch tested? This PR extends the existing test cases to also cover `DeleteFromTable`. Closes #30688 from aokolnychyi/spark-33722. Authored-by: Anton Okolnychyi <aokolnychyi@apple.com> Signed-off-by: Dongjoon Hyun <dongjoon@apache.org>

…n properly ### What changes were proposed in this pull request? Make `ResolveRelations` handle plan id properly ### Why are the changes needed? bug fix for Spark Connect, it won't affect classic Spark SQL before this PR: ``` from pyspark.sql import functions as sf spark.range(10).withColumn("value_1", sf.lit(1)).write.saveAsTable("test_table_1") spark.range(10).withColumnRenamed("id", "index").withColumn("value_2", sf.lit(2)).write.saveAsTable("test_table_2") df1 = spark.read.table("test_table_1") df2 = spark.read.table("test_table_2") df3 = spark.read.table("test_table_1") join1 = df1.join(df2, on=df1.id==df2.index).select(df2.index, df2.value_2) join2 = df3.join(join1, how="left", on=join1.index==df3.id) join2.schema ``` fails with ``` AnalysisException: [CANNOT_RESOLVE_DATAFRAME_COLUMN] Cannot resolve dataframe column "id". It's probably because of illegal references like `df1.select(df2.col("a"))`. SQLSTATE: 42704 ``` That is due to existing plan caching in `ResolveRelations` doesn't work with Spark Connect ``` === Applying Rule org.apache.spark.sql.catalyst.analysis.Analyzer$ResolveRelations === '[#12]Join LeftOuter, '`==`('index, 'id) '[#12]Join LeftOuter, '`==`('index, 'id) !:- '[#9]UnresolvedRelation [test_table_1], [], false :- '[#9]SubqueryAlias spark_catalog.default.test_table_1 !+- '[#11]Project ['index, 'value_2] : +- 'UnresolvedCatalogRelation `spark_catalog`.`default`.`test_table_1`, [], false ! +- '[#10]Join Inner, '`==`('id, 'index) +- '[#11]Project ['index, 'value_2] ! :- '[#7]UnresolvedRelation [test_table_1], [], false +- '[#10]Join Inner, '`==`('id, 'index) ! +- '[#8]UnresolvedRelation [test_table_2], [], false :- '[#9]SubqueryAlias spark_catalog.default.test_table_1 ! : +- 'UnresolvedCatalogRelation `spark_catalog`.`default`.`test_table_1`, [], false ! +- '[#8]SubqueryAlias spark_catalog.default.test_table_2 ! +- 'UnresolvedCatalogRelation `spark_catalog`.`default`.`test_table_2`, [], false Can not resolve 'id with plan 7 ``` `[#7]UnresolvedRelation [test_table_1], [], false` was wrongly resolved to the cached one ``` :- '[#9]SubqueryAlias spark_catalog.default.test_table_1 +- 'UnresolvedCatalogRelation `spark_catalog`.`default`.`test_table_1`, [], false ``` ### Does this PR introduce _any_ user-facing change? yes, bug fix ### How was this patch tested? added ut ### Was this patch authored or co-authored using generative AI tooling? ci Closes apache#45214 from zhengruifeng/connect_fix_read_join. Authored-by: Ruifeng Zheng <ruifengz@apache.org> Signed-off-by: Dongjoon Hyun <dhyun@apple.com>

…r `postgreSQL/float4.sql` and `postgreSQL/int8.sql` ### What changes were proposed in this pull request? This pr regenerate Java 21 golden file for `postgreSQL/float4.sql` and `postgreSQL/int8.sql` to fix Java 21 daily test. ### Why are the changes needed? Fix Java 21 daily test: - https://github.com/apache/spark/actions/runs/10823897095/job/30030200710 ``` [info] - postgreSQL/float4.sql *** FAILED *** (1 second, 100 milliseconds) [info] postgreSQL/float4.sql [info] Expected "...arameters" : { [info] "[ansiConfig" : "\"spark.sql.ansi.enabled\"", [info] "]expression" : "'N A ...", but got "...arameters" : { [info] "[]expression" : "'N A ..." Result did not match for query #11 [info] SELECT float('N A N') (SQLQueryTestSuite.scala:663) ... [info] - postgreSQL/int8.sql *** FAILED *** (2 seconds, 474 milliseconds) [info] postgreSQL/int8.sql [info] Expected "...arameters" : { [info] "[ansiConfig" : "\"spark.sql.ansi.enabled\"", [info] "]sourceType" : "\"BIG...", but got "...arameters" : { [info] "[]sourceType" : "\"BIG..." Result did not match for query apache#66 [info] SELECT CAST(q1 AS int) FROM int8_tbl WHERE q2 <> 456 (SQLQueryTestSuite.scala:663) ... [info] *** 2 TESTS FAILED *** [error] Failed: Total 3559, Failed 2, Errors 0, Passed 3557, Ignored 4 [error] Failed tests: [error] org.apache.spark.sql.SQLQueryTestSuite [error] (sql / Test / test) sbt.TestsFailedException: Tests unsuccessful ``` ### Does this PR introduce _any_ user-facing change? No ### How was this patch tested? - Pass Github Acitons - Manual checked: `build/sbt "sql/testOnly org.apache.spark.sql.SQLQueryTestSuite" with Java 21, all test passed ` ### Was this patch authored or co-authored using generative AI tooling? No Closes apache#48089 from LuciferYang/SPARK-49578-FOLLOWUP. Authored-by: yangjie01 <yangjie01@baidu.com> Signed-off-by: Dongjoon Hyun <dongjoon@apache.org>

yaooqinn and others added 6 commits December 9, 2020 06:44

ahshahid merged commit c74663e into ahshahid:master Dec 9, 2020

github-actions bot added DOCS SQL YARN labels Dec 9, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

merging master #11

merging master #11

ahshahid commented Dec 9, 2020

merging master #11

merging master #11

Conversation

ahshahid commented Dec 9, 2020

What changes were proposed in this pull request?

Why are the changes needed?

Does this PR introduce any user-facing change?

How was this patch tested?