SPARK-1121 Only add avro if the build is for Hadoop 0.23.X and SPARK_YARN is set #6

ScrapCodes · 2014-02-26T13:01:48Z

No description provided.

AmplabJenkins · 2014-02-26T18:16:22Z

Merged build triggered.

AmplabJenkins · 2014-02-26T18:16:22Z

Merged build started.

pwendell · 2014-02-26T19:42:34Z

docs/building-with-maven.md

+
+## A note about Hadoop version 0.23.x
+
+For building spark with hadoop 0.23.x and also yarn, you will have to provide a dependency on avro manually.


Mind being more specific here. "You will have to manually add a dependency on (org.apache.avro, avro, 1.7.4)."

pwendell · 2014-02-26T21:53:01Z

Overall looks good but gave some minor comments.

pwendell · 2014-02-26T22:14:51Z

Jenkins, test this please.

AmplabJenkins · 2014-02-26T22:18:50Z

Build triggered.

AmplabJenkins · 2014-02-26T22:18:50Z

Build started.

pwendell · 2014-02-26T22:19:07Z

project/SparkBuild.scala

@@ -87,7 +87,7 @@ object SparkBuild extends Build {
    case Some(v) => v.toBoolean
  }
  lazy val hadoopClient = if (hadoopVersion.startsWith("0.20.") || hadoopVersion == "1.0.0") "hadoop-core" else "hadoop-client"
-
+  val isAvroNeeded = hadoopVersion.startsWith("0.23.") && isYarnEnabled


Would you mind restructuring this to be called maybeAvro and have it return a sequence of dependencies (that might be empty)? I'm just asking because @sryza will need to do something similar for Hadoop dependencies and it will be cleaner to have something like:

libraryDependencies ++= maybeAvro libraryDependencies ++= maybeHadoop

rather than a bunch of if statements.

pwendell · 2014-02-26T22:19:59Z

@ScrapCodes thanks for looking into this! Added some suggestions inline.

AmplabJenkins · 2014-02-26T22:46:39Z

Build finished.

AmplabJenkins · 2014-02-26T22:46:39Z

All automated tests passed.
Refer to this link for build results: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/12887/

AmplabJenkins · 2014-02-26T22:53:54Z

Build triggered.

Merge 0.8.0-candidate-csd branch to master-csd

SPY-287 updated streaming iterable

Minor changes to get more tests passing.

ScrapCodes · 2014-02-27T05:26:59Z

project/SparkBuild.scala

@@ -130,6 +130,8 @@ object SparkBuild extends Build {
    javacOptions := Seq("-target", JAVAC_JVM_VERSION, "-source", JAVAC_JVM_VERSION),
    unmanagedJars in Compile <<= baseDirectory map { base => (base / "lib" ** "*.jar").classpath },
    retrieveManaged := true,
+    // This is to add convenience of enabling sbt -Dsbt.offline=true for making the build offline.
+    offline := "true".equalsIgnoreCase(sys.props("sbt.offline")),


No, since sbt does not have it by default thought we can have it for convenience.

AmplabJenkins · 2014-02-27T05:38:55Z

Build triggered.

AmplabJenkins · 2014-02-27T05:38:55Z

Build started.

AmplabJenkins · 2014-02-27T05:39:03Z

Build triggered.

AmplabJenkins · 2014-02-27T05:39:59Z

Build finished.

AmplabJenkins · 2014-02-27T05:40:00Z

One or more automated tests failed
Refer to this link for build results: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/12904/

AmplabJenkins · 2014-02-27T05:58:55Z

Build triggered.

AmplabJenkins · 2014-02-27T05:58:55Z

Build started.

AmplabJenkins · 2014-02-27T05:59:03Z

Build triggered.

AmplabJenkins · 2014-02-27T06:27:37Z

Build finished.

AmplabJenkins · 2014-02-27T06:27:38Z

All automated tests passed.
Refer to this link for build results: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/12905/

pwendell · 2014-02-27T06:54:49Z

Thanks @ScrapCodes looks good.

pwendell · 2014-02-27T06:55:13Z

hmm... apears it does not merge cleanly

…YARN is set

[SPARK-23926][SQL] Extending reverse function to support ArrayType ar…

[SPARK-189] CLI 0.5.4

…enkins's test results ### What changes were proposed in this pull request? See https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/109834/testReport/junit/org.apache.spark.sql/SQLQueryTestSuite/ ![Screen Shot 2019-08-28 at 4 08 58 PM](https://user-images.githubusercontent.com/6477701/63833484-2a23ea00-c9ae-11e9-91a1-0859cb183fea.png) ```xml <?xml version="1.0" encoding="UTF-8"?> <testsuite hostname="C02Y52ZLJGH5" name="org.apache.spark.sql.SQLQueryTestSuite" tests="3" errors="0" failures="0" skipped="0" time="14.475"> ... <testcase classname="org.apache.spark.sql.SQLQueryTestSuite" name="sql - Scala UDF" time="6.703"> </testcase> <testcase classname="org.apache.spark.sql.SQLQueryTestSuite" name="sql - Regular Python UDF" time="4.442"> </testcase> <testcase classname="org.apache.spark.sql.SQLQueryTestSuite" name="sql - Scalar Pandas UDF" time="3.33"> </testcase> <system-out/> <system-err/> </testsuite> ``` Root cause seems a bug in SBT - it truncates the test name based on the last dot. sbt/sbt#2949 https://github.com/sbt/sbt/blob/v0.13.18/testing/src/main/scala/sbt/JUnitXmlTestsListener.scala#L71-L79 I tried to find a better way but couldn't find. Therefore, this PR proposes a workaround by appending the test file name into the assert log: ```diff [info] - inner-join.sql *** FAILED *** (4 seconds, 306 milliseconds) + [info] inner-join.sql [info] Expected "1 a [info] 1 a [info] 1 b [info] 1[]", but got "1 a [info] 1 a [info] 1 b [info] 1[ b]" Result did not match for query #6 [info] SELECT tb.* FROM ta INNER JOIN tb ON ta.a = tb.a AND ta.tag = tb.tag (SQLQueryTestSuite.scala:377) [info] org.scalatest.exceptions.TestFailedException: [info] at org.scalatest.Assertions.newAssertionFailedException(Assertions.scala:528) ``` It will at least prevent us to search full logs to identify which test file is failed by clicking filed test. Note that this PR does not fully fix the issue but only fix the logs on its failed tests. ### Why are the changes needed? To debug Jenkins logs easier. Otherwise, we should open full logs and search which test was failed. ### Does this PR introduce any user-facing change? It will print out the file name of failed tests in Jenkins' test reports. ### How was this patch tested? Manually tested but Jenkins tests are required in this PR. Now it at least shows which file it is: ![Screen Shot 2019-08-30 at 10 16 32 PM](https://user-images.githubusercontent.com/6477701/64023705-de22a200-cb73-11e9-8806-2e98ad35adef.png) Closes #25630 from HyukjinKwon/SPARK-28894-1. Authored-by: HyukjinKwon <gurwls223@apache.org> Signed-off-by: Dongjoon Hyun <dhyun@apple.com>

…nce-enhancement change executors requests policy

### What changes were proposed in this pull request? This PR aims to fix `semanticEquals` works correctly on `GetMapValue` expressions having literal maps with `ArrayBasedMapData` and `GenericArrayData`. ### Why are the changes needed? This is a regression from Apache Spark 1.6.x. ```scala scala> sc.version res1: String = 1.6.3 scala> sqlContext.sql("SELECT map('k1', 'v1')[k] FROM t GROUP BY map('k1', 'v1')[k]").show +---+ |_c0| +---+ | v1| +---+ ``` Apache Spark 2.x ~ 3.0.1 raise`RuntimeException` for the following queries. ```sql CREATE TABLE t USING ORC AS SELECT map('k1', 'v1') m, 'k1' k SELECT map('k1', 'v1')[k] FROM t GROUP BY 1 SELECT map('k1', 'v1')[k] FROM t GROUP BY map('k1', 'v1')[k] SELECT map('k1', 'v1')[k] a FROM t GROUP BY a ``` **BEFORE** ```scala Caused by: java.lang.RuntimeException: Couldn't find k#3 in [keys: [k1], values: [v1][k#3]#6] at scala.sys.package$.error(package.scala:27) at org.apache.spark.sql.catalyst.expressions.BindReferences$$anonfun$bindReference$1$$anonfun$applyOrElse$1.apply(BoundAttribute.scala:85) at org.apache.spark.sql.catalyst.expressions.BindReferences$$anonfun$bindReference$1$$anonfun$applyOrElse$1.apply(BoundAttribute.scala:79) at org.apache.spark.sql.catalyst.errors.package$.attachTree(package.scala:52) ``` **AFTER** ```sql spark-sql> SELECT map('k1', 'v1')[k] FROM t GROUP BY 1; v1 Time taken: 1.278 seconds, Fetched 1 row(s) spark-sql> SELECT map('k1', 'v1')[k] FROM t GROUP BY map('k1', 'v1')[k]; v1 Time taken: 0.313 seconds, Fetched 1 row(s) spark-sql> SELECT map('k1', 'v1')[k] a FROM t GROUP BY a; v1 Time taken: 0.265 seconds, Fetched 1 row(s) ``` ### Does this PR introduce _any_ user-facing change? No. ### How was this patch tested? Pass the CIs with the newly added test case. Closes #30246 from dongjoon-hyun/SPARK-33338. Authored-by: Dongjoon Hyun <dhyun@apple.com> Signed-off-by: Dongjoon Hyun <dhyun@apple.com> (cherry picked from commit 42c0b17) Signed-off-by: Dongjoon Hyun <dhyun@apple.com>

### What changes were proposed in this pull request? This PR aims to fix `semanticEquals` works correctly on `GetMapValue` expressions having literal maps with `ArrayBasedMapData` and `GenericArrayData`. ### Why are the changes needed? This is a regression from Apache Spark 1.6.x. ```scala scala> sc.version res1: String = 1.6.3 scala> sqlContext.sql("SELECT map('k1', 'v1')[k] FROM t GROUP BY map('k1', 'v1')[k]").show +---+ |_c0| +---+ | v1| +---+ ``` Apache Spark 2.x ~ 3.0.1 raise`RuntimeException` for the following queries. ```sql CREATE TABLE t USING ORC AS SELECT map('k1', 'v1') m, 'k1' k SELECT map('k1', 'v1')[k] FROM t GROUP BY 1 SELECT map('k1', 'v1')[k] FROM t GROUP BY map('k1', 'v1')[k] SELECT map('k1', 'v1')[k] a FROM t GROUP BY a ``` **BEFORE** ```scala Caused by: java.lang.RuntimeException: Couldn't find k#3 in [keys: [k1], values: [v1][k#3]#6] at scala.sys.package$.error(package.scala:27) at org.apache.spark.sql.catalyst.expressions.BindReferences$$anonfun$bindReference$1$$anonfun$applyOrElse$1.apply(BoundAttribute.scala:85) at org.apache.spark.sql.catalyst.expressions.BindReferences$$anonfun$bindReference$1$$anonfun$applyOrElse$1.apply(BoundAttribute.scala:79) at org.apache.spark.sql.catalyst.errors.package$.attachTree(package.scala:52) ``` **AFTER** ```sql spark-sql> SELECT map('k1', 'v1')[k] FROM t GROUP BY 1; v1 Time taken: 1.278 seconds, Fetched 1 row(s) spark-sql> SELECT map('k1', 'v1')[k] FROM t GROUP BY map('k1', 'v1')[k]; v1 Time taken: 0.313 seconds, Fetched 1 row(s) spark-sql> SELECT map('k1', 'v1')[k] a FROM t GROUP BY a; v1 Time taken: 0.265 seconds, Fetched 1 row(s) ``` ### Does this PR introduce _any_ user-facing change? No. ### How was this patch tested? Pass the CIs with the newly added test case. Closes #30246 from dongjoon-hyun/SPARK-33338. Authored-by: Dongjoon Hyun <dhyun@apple.com> Signed-off-by: Dongjoon Hyun <dhyun@apple.com>

### What changes were proposed in this pull request? This PR aims to fix `semanticEquals` works correctly on `GetMapValue` expressions having literal maps with `ArrayBasedMapData` and `GenericArrayData`. ### Why are the changes needed? This is a regression from Apache Spark 1.6.x. ```scala scala> sc.version res1: String = 1.6.3 scala> sqlContext.sql("SELECT map('k1', 'v1')[k] FROM t GROUP BY map('k1', 'v1')[k]").show +---+ |_c0| +---+ | v1| +---+ ``` Apache Spark 2.x ~ 3.0.1 raise`RuntimeException` for the following queries. ```sql CREATE TABLE t USING ORC AS SELECT map('k1', 'v1') m, 'k1' k SELECT map('k1', 'v1')[k] FROM t GROUP BY 1 SELECT map('k1', 'v1')[k] FROM t GROUP BY map('k1', 'v1')[k] SELECT map('k1', 'v1')[k] a FROM t GROUP BY a ``` **BEFORE** ```scala Caused by: java.lang.RuntimeException: Couldn't find k#3 in [keys: [k1], values: [v1][k#3]#6] at scala.sys.package$.error(package.scala:27) at org.apache.spark.sql.catalyst.expressions.BindReferences$$anonfun$bindReference$1$$anonfun$applyOrElse$1.apply(BoundAttribute.scala:85) at org.apache.spark.sql.catalyst.expressions.BindReferences$$anonfun$bindReference$1$$anonfun$applyOrElse$1.apply(BoundAttribute.scala:79) at org.apache.spark.sql.catalyst.errors.package$.attachTree(package.scala:52) ``` **AFTER** ```sql spark-sql> SELECT map('k1', 'v1')[k] FROM t GROUP BY 1; v1 Time taken: 1.278 seconds, Fetched 1 row(s) spark-sql> SELECT map('k1', 'v1')[k] FROM t GROUP BY map('k1', 'v1')[k]; v1 Time taken: 0.313 seconds, Fetched 1 row(s) spark-sql> SELECT map('k1', 'v1')[k] a FROM t GROUP BY a; v1 Time taken: 0.265 seconds, Fetched 1 row(s) ``` ### Does this PR introduce _any_ user-facing change? No. ### How was this patch tested? Pass the CIs with the newly added test case. Closes #30246 from dongjoon-hyun/SPARK-33338. Authored-by: Dongjoon Hyun <dhyun@apple.com> Signed-off-by: Dongjoon Hyun <dhyun@apple.com> (cherry picked from commit 42c0b17) Signed-off-by: Dongjoon Hyun <dhyun@apple.com>

### What changes were proposed in this pull request? Skip capture maven repo config for views. ### Why are the changes needed? Due to the bad network, we always use the thirdparty maven repo to run test. e.g., ``` build/sbt "test:testOnly *SQLQueryTestSuite" -Dspark.sql.maven.additionalRemoteRepositories=xxxxx ``` It's failed with such error msg ``` [info] - show-tblproperties.sql *** FAILED *** (128 milliseconds) [info] show-tblproperties.sql [info] Expected "...rredTempViewNames [][]", but got "...rredTempViewNames [][ [info] view.sqlConfig.spark.sql.maven.additionalRemoteRepositories xxxxx]" Result did not match for query #6 [info] SHOW TBLPROPERTIES view (SQLQueryTestSuite.scala:464) ``` It's not necessary to capture the maven config to view since it's a session level config. ### Does this PR introduce _any_ user-facing change? No. ### How was this patch tested? manual test pass ``` build/sbt "test:testOnly *SQLQueryTestSuite" -Dspark.sql.maven.additionalRemoteRepositories=xxx ``` Closes #31856 from ulysses-you/skip-maven-config. Authored-by: ulysses-you <ulyssesyou18@gmail.com> Signed-off-by: Kent Yao <yao@apache.org>

backport [#31856](#31856) for branch-3.1 ### What changes were proposed in this pull request? Skip capture maven repo config for views. ### Why are the changes needed? Due to the bad network, we always use the thirdparty maven repo to run test. e.g., ``` build/sbt "test:testOnly *SQLQueryTestSuite" -Dspark.sql.maven.additionalRemoteRepositories=xxxxx ``` It's failed with such error msg ``` [info] - show-tblproperties.sql *** FAILED *** (128 milliseconds) [info] show-tblproperties.sql [info] Expected "...rredTempViewNames [][]", but got "...rredTempViewNames [][ [info] view.sqlConfig.spark.sql.maven.additionalRemoteRepositories xxxxx]" Result did not match for query #6 [info] SHOW TBLPROPERTIES view (SQLQueryTestSuite.scala:464) ``` It's not necessary to capture the maven config to view since it's a session level config. ### Does this PR introduce _any_ user-facing change? No. ### How was this patch tested? manual test pass ``` build/sbt "test:testOnly *SQLQueryTestSuite" -Dspark.sql.maven.additionalRemoteRepositories=xxx ``` Closes #31856 from ulysses-you/skip-maven-config. Authored-by: ulysses-you <ulyssesyou18gmail.com> Signed-off-by: Kent Yao <yaoapache.org> Closes #31879 from ulysses-you/SPARK-34766-3-1. Authored-by: ulysses-you <ulyssesyou18@gmail.com> Signed-off-by: Wenchen Fan <wenchen@databricks.com>

backport [apache#31856](apache#31856) for branch-3.1 ### What changes were proposed in this pull request? Skip capture maven repo config for views. ### Why are the changes needed? Due to the bad network, we always use the thirdparty maven repo to run test. e.g., ``` build/sbt "test:testOnly *SQLQueryTestSuite" -Dspark.sql.maven.additionalRemoteRepositories=xxxxx ``` It's failed with such error msg ``` [info] - show-tblproperties.sql *** FAILED *** (128 milliseconds) [info] show-tblproperties.sql [info] Expected "...rredTempViewNames [][]", but got "...rredTempViewNames [][ [info] view.sqlConfig.spark.sql.maven.additionalRemoteRepositories xxxxx]" Result did not match for query apache#6 [info] SHOW TBLPROPERTIES view (SQLQueryTestSuite.scala:464) ``` It's not necessary to capture the maven config to view since it's a session level config. ### Does this PR introduce _any_ user-facing change? No. ### How was this patch tested? manual test pass ``` build/sbt "test:testOnly *SQLQueryTestSuite" -Dspark.sql.maven.additionalRemoteRepositories=xxx ``` Closes apache#31856 from ulysses-you/skip-maven-config. Authored-by: ulysses-you <ulyssesyou18gmail.com> Signed-off-by: Kent Yao <yaoapache.org> Closes apache#31879 from ulysses-you/SPARK-34766-3-1. Authored-by: ulysses-you <ulyssesyou18@gmail.com> Signed-off-by: Wenchen Fan <wenchen@databricks.com>

…le 'xxx' (#6)

…6327 VINITUS-241 patch SPARK-36327

* initial change of grammar to support string collation * initial change of grammar to support string collation

…to the `hive-thriftserver` module to fix the Maven daily test ### What changes were proposed in this pull request? This pr add bouncycastle-related test dependencies to the `hive-thrift` module to fix the Maven daily test. ### Why are the changes needed? `sql-on-files.sql` added the following statement in #47480, which caused the Maven daily test to fail https://github.com/apache/spark/blob/2363aec0c14ead24ade2bfa23478a4914f179c00/sql/core/src/test/resources/sql-tests/inputs/sql-on-files.sql#L10 - https://github.com/apache/spark/actions/runs/10094638521/job/27943309504 - https://github.com/apache/spark/actions/runs/10095571472/job/27943298802 ``` - sql-on-files.sql *** FAILED *** "" did not contain "Exception" Exception did not match for query #6 CREATE TABLE sql_on_files.test_orc USING ORC AS SELECT 1, expected: , but got: java.sql.SQLException org.apache.hive.service.cli.HiveSQLException: Error running query: org.apache.spark.SparkException: Job aborted due to stage failure: Task 0 in stage 8542.0 failed 1 times, most recent failure: Lost task 0.0 in stage 8542.0 (TID 8594) (localhost executor driver): java.lang.NoClassDefFoundError: org/bouncycastle/jce/provider/BouncyCastleProvider at test.org.apache.spark.sql.execution.datasources.orc.FakeKeyProvider$Factory.createProvider(FakeKeyProvider.java:127) at org.apache.hadoop.crypto.key.KeyProviderFactory.get(KeyProviderFactory.java:96) at org.apache.hadoop.crypto.key.KeyProviderFactory.getProviders(KeyProviderFactory.java:68) at org.apache.orc.impl.HadoopShimsCurrent.createKeyProvider(HadoopShimsCurrent.java:97) at org.apache.orc.impl.HadoopShimsCurrent.getHadoopKeyProvider(HadoopShimsCurrent.java:131) at org.apache.orc.impl.CryptoUtils$HadoopKeyProviderFactory.create(CryptoUtils.java:158) at org.apache.orc.impl.CryptoUtils.getKeyProvider(CryptoUtils.java:141) at org.apache.orc.impl.WriterImpl.setupEncryption(WriterImpl.java:1015) at org.apache.orc.impl.WriterImpl.<init>(WriterImpl.java:164) at org.apache.orc.OrcFile.createWriter(OrcFile.java:1078) at org.apache.spark.sql.execution.datasources.orc.OrcOutputWriter.<init>(OrcOutputWriter.scala:49) at org.apache.spark.sql.execution.datasources.orc.OrcFileFormat$$anon$1.newInstance(OrcFileFormat.scala:89) at org.apache.spark.sql.execution.datasources.SingleDirectoryDataWriter.newOutputWriter(FileFormatDataWriter.scala:180) at org.apache.spark.sql.execution.datasources.SingleDirectoryDataWriter.<init>(FileFormatDataWriter.scala:165) at org.apache.spark.sql.execution.datasources.FileFormatWriter$.executeTask(FileFormatWriter.scala:391) at org.apache.spark.sql.execution.datasources.WriteFilesExec.$anonfun$doExecuteWrite$1(WriteFiles.scala:107) at org.apache.spark.rdd.RDD.$anonfun$mapPartitionsInternal$2(RDD.scala:901) at org.apache.spark.rdd.RDD.$anonfun$mapPartitionsInternal$2$adapted(RDD.scala:901) at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:52) at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:374) at org.apache.spark.rdd.RDD.iterator(RDD.scala:338) at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:93) at org.apache.spark.TaskContext.runTaskWithListeners(TaskContext.scala:171) at org.apache.spark.scheduler.Task.run(Task.scala:146) at org.apache.spark.executor.Executor$TaskRunner.$anonfun$run$5(Executor.scala:644) at org.apache.spark.util.SparkErrorUtils.tryWithSafeFinally(SparkErrorUtils.scala:64) at org.apache.spark.util.SparkErrorUtils.tryWithSafeFinally$(SparkErrorUtils.scala:61) at org.apache.spark.util.Utils$.tryWithSafeFinally(Utils.scala:99) at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:647) at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1136) at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:635) at java.base/java.lang.Thread.run(Thread.java:840) Caused by: java.lang.ClassNotFoundException: org.bouncycastle.jce.provider.BouncyCastleProvider at java.base/jdk.internal.loader.BuiltinClassLoader.loadClass(BuiltinClassLoader.java:641) at java.base/jdk.internal.loader.ClassLoaders$AppClassLoader.loadClass(ClassLoaders.java:188) at java.base/java.lang.ClassLoader.loadClass(ClassLoader.java:525) ... 32 more ``` Because we have configured `hadoop.security.key.provider.path` as `test:///` in the parent `pom.xml`, https://github.com/apache/spark/blob/5ccf9ba958f492c1eb4dde22a647ba75aba63d8e/pom.xml#L3165-L3166 `KeyProviderFactory#getProviders` will use `FakeKeyProvider$Factory` to create instances of `FakeKeyProvider`. https://github.com/apache/spark/blob/5ccf9ba958f492c1eb4dde22a647ba75aba63d8e/sql/core/src/test/resources/META-INF/services/org.apache.hadoop.crypto.key.KeyProviderFactory#L18 During the initialization of `FakeKeyProvider`, it first initializes its superclass `org.apache.hadoop.crypto.key.KeyProvider`, which leads to the loading of the `BouncyCastleProvider` class. Therefore, we need to add bouncycastle-related test dependencies in the `hive-thrift` module. ### Does this PR introduce _any_ user-facing change? No ### How was this patch tested? Manual Test with this pr. ``` build/mvn -Phive -Phive-thriftserver clean install -DskipTests build/mvn -Phive -Phive-thriftserver clean install -Dtest=none -DwildcardSuites=org.apache.spark.sql.hive.thriftserver.ThriftServerQueryTestSuite -pl sql/hive-thriftserver ``` ``` Run completed in 6 minutes, 52 seconds. Total number of tests run: 243 Suites: completed 2, aborted 0 Tests: succeeded 243, failed 0, canceled 0, ignored 20, pending 0 All tests passed. ``` ### Was this patch authored or co-authored using generative AI tooling? No Closes #47496 from LuciferYang/thrift-bouncycastle. Authored-by: yangjie01 <yangjie01@baidu.com> Signed-off-by: Dongjoon Hyun <dhyun@apple.com>

…to the `hive-thriftserver` module to fix the Maven daily test ### What changes were proposed in this pull request? This pr add bouncycastle-related test dependencies to the `hive-thrift` module to fix the Maven daily test. ### Why are the changes needed? `sql-on-files.sql` added the following statement in apache#47480, which caused the Maven daily test to fail https://github.com/apache/spark/blob/2363aec0c14ead24ade2bfa23478a4914f179c00/sql/core/src/test/resources/sql-tests/inputs/sql-on-files.sql#L10 - https://github.com/apache/spark/actions/runs/10094638521/job/27943309504 - https://github.com/apache/spark/actions/runs/10095571472/job/27943298802 ``` - sql-on-files.sql *** FAILED *** "" did not contain "Exception" Exception did not match for query apache#6 CREATE TABLE sql_on_files.test_orc USING ORC AS SELECT 1, expected: , but got: java.sql.SQLException org.apache.hive.service.cli.HiveSQLException: Error running query: org.apache.spark.SparkException: Job aborted due to stage failure: Task 0 in stage 8542.0 failed 1 times, most recent failure: Lost task 0.0 in stage 8542.0 (TID 8594) (localhost executor driver): java.lang.NoClassDefFoundError: org/bouncycastle/jce/provider/BouncyCastleProvider at test.org.apache.spark.sql.execution.datasources.orc.FakeKeyProvider$Factory.createProvider(FakeKeyProvider.java:127) at org.apache.hadoop.crypto.key.KeyProviderFactory.get(KeyProviderFactory.java:96) at org.apache.hadoop.crypto.key.KeyProviderFactory.getProviders(KeyProviderFactory.java:68) at org.apache.orc.impl.HadoopShimsCurrent.createKeyProvider(HadoopShimsCurrent.java:97) at org.apache.orc.impl.HadoopShimsCurrent.getHadoopKeyProvider(HadoopShimsCurrent.java:131) at org.apache.orc.impl.CryptoUtils$HadoopKeyProviderFactory.create(CryptoUtils.java:158) at org.apache.orc.impl.CryptoUtils.getKeyProvider(CryptoUtils.java:141) at org.apache.orc.impl.WriterImpl.setupEncryption(WriterImpl.java:1015) at org.apache.orc.impl.WriterImpl.<init>(WriterImpl.java:164) at org.apache.orc.OrcFile.createWriter(OrcFile.java:1078) at org.apache.spark.sql.execution.datasources.orc.OrcOutputWriter.<init>(OrcOutputWriter.scala:49) at org.apache.spark.sql.execution.datasources.orc.OrcFileFormat$$anon$1.newInstance(OrcFileFormat.scala:89) at org.apache.spark.sql.execution.datasources.SingleDirectoryDataWriter.newOutputWriter(FileFormatDataWriter.scala:180) at org.apache.spark.sql.execution.datasources.SingleDirectoryDataWriter.<init>(FileFormatDataWriter.scala:165) at org.apache.spark.sql.execution.datasources.FileFormatWriter$.executeTask(FileFormatWriter.scala:391) at org.apache.spark.sql.execution.datasources.WriteFilesExec.$anonfun$doExecuteWrite$1(WriteFiles.scala:107) at org.apache.spark.rdd.RDD.$anonfun$mapPartitionsInternal$2(RDD.scala:901) at org.apache.spark.rdd.RDD.$anonfun$mapPartitionsInternal$2$adapted(RDD.scala:901) at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:52) at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:374) at org.apache.spark.rdd.RDD.iterator(RDD.scala:338) at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:93) at org.apache.spark.TaskContext.runTaskWithListeners(TaskContext.scala:171) at org.apache.spark.scheduler.Task.run(Task.scala:146) at org.apache.spark.executor.Executor$TaskRunner.$anonfun$run$5(Executor.scala:644) at org.apache.spark.util.SparkErrorUtils.tryWithSafeFinally(SparkErrorUtils.scala:64) at org.apache.spark.util.SparkErrorUtils.tryWithSafeFinally$(SparkErrorUtils.scala:61) at org.apache.spark.util.Utils$.tryWithSafeFinally(Utils.scala:99) at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:647) at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1136) at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:635) at java.base/java.lang.Thread.run(Thread.java:840) Caused by: java.lang.ClassNotFoundException: org.bouncycastle.jce.provider.BouncyCastleProvider at java.base/jdk.internal.loader.BuiltinClassLoader.loadClass(BuiltinClassLoader.java:641) at java.base/jdk.internal.loader.ClassLoaders$AppClassLoader.loadClass(ClassLoaders.java:188) at java.base/java.lang.ClassLoader.loadClass(ClassLoader.java:525) ... 32 more ``` Because we have configured `hadoop.security.key.provider.path` as `test:///` in the parent `pom.xml`, https://github.com/apache/spark/blob/5ccf9ba958f492c1eb4dde22a647ba75aba63d8e/pom.xml#L3165-L3166 `KeyProviderFactory#getProviders` will use `FakeKeyProvider$Factory` to create instances of `FakeKeyProvider`. https://github.com/apache/spark/blob/5ccf9ba958f492c1eb4dde22a647ba75aba63d8e/sql/core/src/test/resources/META-INF/services/org.apache.hadoop.crypto.key.KeyProviderFactory#L18 During the initialization of `FakeKeyProvider`, it first initializes its superclass `org.apache.hadoop.crypto.key.KeyProvider`, which leads to the loading of the `BouncyCastleProvider` class. Therefore, we need to add bouncycastle-related test dependencies in the `hive-thrift` module. ### Does this PR introduce _any_ user-facing change? No ### How was this patch tested? Manual Test with this pr. ``` build/mvn -Phive -Phive-thriftserver clean install -DskipTests build/mvn -Phive -Phive-thriftserver clean install -Dtest=none -DwildcardSuites=org.apache.spark.sql.hive.thriftserver.ThriftServerQueryTestSuite -pl sql/hive-thriftserver ``` ``` Run completed in 6 minutes, 52 seconds. Total number of tests run: 243 Suites: completed 2, aborted 0 Tests: succeeded 243, failed 0, canceled 0, ignored 20, pending 0 All tests passed. ``` ### Was this patch authored or co-authored using generative AI tooling? No Closes apache#47496 from LuciferYang/thrift-bouncycastle. Authored-by: yangjie01 <yangjie01@baidu.com> Signed-off-by: Dongjoon Hyun <dhyun@apple.com>

pwendell reviewed Feb 26, 2014
View reviewed changes

vnivargi referenced this pull request in alteryx/spark Feb 27, 2014

Merge pull request #6 from markhamstra/master-csd

928aa6f

Merge 0.8.0-candidate-csd branch to master-csd

jhartlaub referenced this pull request in alteryx/spark Feb 27, 2014

Merge pull request #6 from markhamstra/streamingIterable

12280b5

SPY-287 updated streaming iterable

marmbrus referenced this pull request in marmbrus/spark Feb 27, 2014

Merge pull request #6 from marmbrus/joinWork

66adceb

Minor changes to get more tests passing.

ScrapCodes reviewed Feb 27, 2014
View reviewed changes

ScrapCodes added 2 commits February 27, 2014 12:32

SPARK-1121-Only add avro if the build is for Hadoop 0.23.X and SPARK_…

46ed2ad

…YARN is set

Review feedback on PR

9b29e34

lokm01 pushed a commit to lokm01/spark that referenced this pull request May 17, 2018

Merge pull request apache#6 from AbsaOSS/feature/array-api-reverse

458118f

[SPARK-23926][SQL] Extending reverse function to support ArrayType ar…

Igosuki pushed a commit to Adikteev/spark that referenced this pull request Jul 31, 2018

Merge pull request apache#6 from mesosphere/cli-0.5.4

53ca3bb

[SPARK-189] CLI 0.5.4

PingHao mentioned this pull request Oct 9, 2019

[SPARK-28120][SS] Rocksdb state storage implementation #24922

Closed

ringtail added a commit to ringtail/spark that referenced this pull request Apr 21, 2020

Merge pull request apache#6 from ringtail/feature/scheduling-performa…

fbe64e1

…nce-enhancement change executors requests policy

AngersZhuuuu mentioned this pull request Jun 18, 2020

[SPARK-32002][SQL]Support ExtractValue from nested ArrayStruct #28860

Closed

qiuxin2012 pushed a commit to qiuxin2012/spark that referenced this pull request Jan 18, 2022

add sgx log level option (apache#6)

ab17883

aimtsou added a commit to aimtsou/spark that referenced this pull request Mar 1, 2023

Removing deprecated types apache#6

7823131

wangyum added a commit that referenced this pull request May 26, 2023

[CARMEL-2691] Support % as well in the pattern spec for show like tab…

1930e50

…le 'xxx' (#6)

risyomei pushed a commit to risyomei/spark that referenced this pull request Jun 26, 2023

Merge pull request apache#6 from IU/feature/VINITUS-241-patch-SPARK-3…

d59c52a

…6327 VINITUS-241 patch SPARK-36327

stefankandic added a commit to stefankandic/spark that referenced this pull request Feb 5, 2024

New Collate Grammar (apache#6)

835be0f

* initial change of grammar to support string collation * initial change of grammar to support string collation

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

SPARK-1121 Only add avro if the build is for Hadoop 0.23.X and SPARK_YARN is set #6

SPARK-1121 Only add avro if the build is for Hadoop 0.23.X and SPARK_YARN is set #6

ScrapCodes commented Feb 26, 2014

AmplabJenkins commented Feb 26, 2014

AmplabJenkins commented Feb 26, 2014

pwendell Feb 26, 2014

pwendell commented Feb 26, 2014

pwendell commented Feb 26, 2014

AmplabJenkins commented Feb 26, 2014

AmplabJenkins commented Feb 26, 2014

pwendell Feb 26, 2014

pwendell commented Feb 26, 2014

AmplabJenkins commented Feb 26, 2014

AmplabJenkins commented Feb 26, 2014

AmplabJenkins commented Feb 26, 2014

ScrapCodes Feb 27, 2014

AmplabJenkins commented Feb 27, 2014

AmplabJenkins commented Feb 27, 2014

AmplabJenkins commented Feb 27, 2014

AmplabJenkins commented Feb 27, 2014

AmplabJenkins commented Feb 27, 2014

AmplabJenkins commented Feb 27, 2014

AmplabJenkins commented Feb 27, 2014

AmplabJenkins commented Feb 27, 2014

AmplabJenkins commented Feb 27, 2014

AmplabJenkins commented Feb 27, 2014

pwendell commented Feb 27, 2014

pwendell commented Feb 27, 2014


		## A note about Hadoop version 0.23.x

		For building spark with hadoop 0.23.x and also yarn, you will have to provide a dependency on avro manually.

SPARK-1121 Only add avro if the build is for Hadoop 0.23.X and SPARK_YARN is set #6

SPARK-1121 Only add avro if the build is for Hadoop 0.23.X and SPARK_YARN is set #6

Conversation

ScrapCodes commented Feb 26, 2014

AmplabJenkins commented Feb 26, 2014

AmplabJenkins commented Feb 26, 2014

pwendell Feb 26, 2014

Choose a reason for hiding this comment

pwendell commented Feb 26, 2014

pwendell commented Feb 26, 2014

AmplabJenkins commented Feb 26, 2014

AmplabJenkins commented Feb 26, 2014

pwendell Feb 26, 2014

Choose a reason for hiding this comment

pwendell commented Feb 26, 2014

AmplabJenkins commented Feb 26, 2014

AmplabJenkins commented Feb 26, 2014

AmplabJenkins commented Feb 26, 2014

ScrapCodes Feb 27, 2014

Choose a reason for hiding this comment

AmplabJenkins commented Feb 27, 2014

AmplabJenkins commented Feb 27, 2014

AmplabJenkins commented Feb 27, 2014

AmplabJenkins commented Feb 27, 2014

AmplabJenkins commented Feb 27, 2014

AmplabJenkins commented Feb 27, 2014

AmplabJenkins commented Feb 27, 2014

AmplabJenkins commented Feb 27, 2014

AmplabJenkins commented Feb 27, 2014

AmplabJenkins commented Feb 27, 2014

pwendell commented Feb 27, 2014

pwendell commented Feb 27, 2014