Fix test_parquet_check_schema_compatibility [databricks] #5484

sperlingxx · 2022-05-13T07:41:35Z

The tests failed on matching error message because DB runtime throws a different exception from Spark, with modified error message.

Signed-off-by: sperlingxx lovedreamf@gmail.com

Signed-off-by: sperlingxx <lovedreamf@gmail.com>

pxLi

the phrase 🤦

pxLi · 2022-05-13T07:48:06Z

build

pxLi · 2022-05-13T09:56:40Z

hmm failed unrelated case in 312db and 321db.
Might be the assert error cases caused some side effects for other test,
or databricks just applied some changes to runtimes...

[2022-05-13T09:46:49.991Z] FAILED ../../src/main/python/hash_aggregate_test.py::test_groupby_std_variance[{'spark.rapids.sql.variableFloatAgg.enabled': 'true', 'spark.rapids.sql.hasNans': 'false', 'spark.rapids.sql.castStringToFloat.enabled': 'true', 'spark.rapids.sql.batchSizeBytes': '250'}-[('a', Decimal(not_null)(18,0)), ('b', Decimal(not_null)(18,0)), ('c', Decimal(not_null)(18,0))]][IGNORE_ORDER({'local': True}), INCOMPAT, APPROXIMATE_FLOAT]

[2022-05-13T09:46:49.989Z] 22/05/13 09:00:13 ERROR Executor: Exception in task 0.0 in stage 5647.0 (TID 16756)
[2022-05-13T09:46:49.989Z] java.net.SocketException: Broken pipe (Write failed)
[2022-05-13T09:46:49.989Z] 	at java.net.SocketOutputStream.socketWrite0(Native Method)
[2022-05-13T09:46:49.989Z] 	at java.net.SocketOutputStream.socketWrite(SocketOutputStream.java:111)
[2022-05-13T09:46:49.989Z] 	at java.net.SocketOutputStream.write(SocketOutputStream.java:155)
[2022-05-13T09:46:49.989Z] 	at java.io.BufferedOutputStream.flushBuffer(BufferedOutputStream.java:82)
[2022-05-13T09:46:49.989Z] 	at java.io.BufferedOutputStream.flush(BufferedOutputStream.java:140)
[2022-05-13T09:46:49.989Z] 	at java.io.DataOutputStream.flush(DataOutputStream.java:123)
[2022-05-13T09:46:49.989Z] 	at org.apache.spark.api.python.BasePythonRunner$WriterThread.$anonfun$run$1(PythonRunner.scala:576)
[2022-05-13T09:46:49.989Z] 	at org.apache.spark.util.Utils$.logUncaughtExceptions(Utils.scala:2264)
[2022-05-13T09:46:49.989Z] 	at org.apache.spark.api.python.BasePythonRunner$WriterThread.run(PythonRunner.scala:365)
[2022-05-13T09:46:49.989Z] 22/05/13 09:00:13 WARN TaskSetManager: Lost task 0.0 in stage 5647.0 (TID 16756) (10.2.128.4 executor driver): java.net.SocketException: Broken pipe (Write failed)
[2022-05-13T09:46:49.989Z] 	at java.net.SocketOutputStream.socketWrite0(Native Method)
[2022-05-13T09:46:49.989Z] 	at java.net.SocketOutputStream.socketWrite(SocketOutputStream.java:111)
[2022-05-13T09:46:49.989Z] 	at java.net.SocketOutputStream.write(SocketOutputStream.java:155)
[2022-05-13T09:46:49.989Z] 	at java.io.BufferedOutputStream.flushBuffer(BufferedOutputStream.java:82)
[2022-05-13T09:46:49.989Z] 	at java.io.BufferedOutputStream.flush(BufferedOutputStream.java:140)
[2022-05-13T09:46:49.989Z] 	at java.io.DataOutputStream.flush(DataOutputStream.java:123)
[2022-05-13T09:46:49.989Z] 	at org.apache.spark.api.python.BasePythonRunner$WriterThread.$anonfun$run$1(PythonRunner.scala:576)
[2022-05-13T09:46:49.989Z] 	at org.apache.spark.util.Utils$.logUncaughtExceptions(Utils.scala:2264)
[2022-05-13T09:46:49.989Z] 	at org.apache.spark.api.python.BasePythonRunner$WriterThread.run(PythonRunner.scala:365)

sperlingxx · 2022-05-13T10:09:53Z

hmm failed unrelated case in 312db and 321db. Might be the assert error cases caused some side effects for other test, or databricks just applied some changes to runtimes...

[2022-05-13T09:46:49.991Z] FAILED ../../src/main/python/hash_aggregate_test.py::test_groupby_std_variance[{'spark.rapids.sql.variableFloatAgg.enabled': 'true', 'spark.rapids.sql.hasNans': 'false', 'spark.rapids.sql.castStringToFloat.enabled': 'true', 'spark.rapids.sql.batchSizeBytes': '250'}-[('a', Decimal(not_null)(18,0)), ('b', Decimal(not_null)(18,0)), ('c', Decimal(not_null)(18,0))]][IGNORE_ORDER({'local': True}), INCOMPAT, APPROXIMATE_FLOAT]

[2022-05-13T09:46:49.989Z] 22/05/13 09:00:13 ERROR Executor: Exception in task 0.0 in stage 5647.0 (TID 16756)
[2022-05-13T09:46:49.989Z] java.net.SocketException: Broken pipe (Write failed)
[2022-05-13T09:46:49.989Z] 	at java.net.SocketOutputStream.socketWrite0(Native Method)
[2022-05-13T09:46:49.989Z] 	at java.net.SocketOutputStream.socketWrite(SocketOutputStream.java:111)
[2022-05-13T09:46:49.989Z] 	at java.net.SocketOutputStream.write(SocketOutputStream.java:155)
[2022-05-13T09:46:49.989Z] 	at java.io.BufferedOutputStream.flushBuffer(BufferedOutputStream.java:82)
[2022-05-13T09:46:49.989Z] 	at java.io.BufferedOutputStream.flush(BufferedOutputStream.java:140)
[2022-05-13T09:46:49.989Z] 	at java.io.DataOutputStream.flush(DataOutputStream.java:123)
[2022-05-13T09:46:49.989Z] 	at org.apache.spark.api.python.BasePythonRunner$WriterThread.$anonfun$run$1(PythonRunner.scala:576)
[2022-05-13T09:46:49.989Z] 	at org.apache.spark.util.Utils$.logUncaughtExceptions(Utils.scala:2264)
[2022-05-13T09:46:49.989Z] 	at org.apache.spark.api.python.BasePythonRunner$WriterThread.run(PythonRunner.scala:365)
[2022-05-13T09:46:49.989Z] 22/05/13 09:00:13 WARN TaskSetManager: Lost task 0.0 in stage 5647.0 (TID 16756) (10.2.128.4 executor driver): java.net.SocketException: Broken pipe (Write failed)
[2022-05-13T09:46:49.989Z] 	at java.net.SocketOutputStream.socketWrite0(Native Method)
[2022-05-13T09:46:49.989Z] 	at java.net.SocketOutputStream.socketWrite(SocketOutputStream.java:111)
[2022-05-13T09:46:49.989Z] 	at java.net.SocketOutputStream.write(SocketOutputStream.java:155)
[2022-05-13T09:46:49.989Z] 	at java.io.BufferedOutputStream.flushBuffer(BufferedOutputStream.java:82)
[2022-05-13T09:46:49.989Z] 	at java.io.BufferedOutputStream.flush(BufferedOutputStream.java:140)
[2022-05-13T09:46:49.989Z] 	at java.io.DataOutputStream.flush(DataOutputStream.java:123)
[2022-05-13T09:46:49.989Z] 	at org.apache.spark.api.python.BasePythonRunner$WriterThread.$anonfun$run$1(PythonRunner.scala:576)
[2022-05-13T09:46:49.989Z] 	at org.apache.spark.util.Utils$.logUncaughtExceptions(Utils.scala:2264)
[2022-05-13T09:46:49.989Z] 	at org.apache.spark.api.python.BasePythonRunner$WriterThread.run(PythonRunner.scala:365)

Really weird ....

sperlingxx · 2022-05-13T10:39:08Z

build

revans2 · 2022-05-13T12:44:03Z

Looks like changes to the runtime. There was a date cast test that just failed too with a new error.

sperlingxx · 2022-05-16T02:14:38Z

Yes, the failed case test_cast_string_date_invalid_ansi_before_320 will be fixed by #5494. So, it looks like a dead lock scenario: two failed cases are fixed by two sepearate PRs.

pxLi · 2022-05-16T08:51:57Z

close this one as #5494 includes both

fix test_parquet_check_schema_compatibility for Databricks runtime

4334f35

Signed-off-by: sperlingxx <lovedreamf@gmail.com>

sperlingxx requested a review from pxLi May 13, 2022 07:42

pxLi added bug Something isn't working test Only impacts tests labels May 13, 2022

pxLi approved these changes May 13, 2022

View reviewed changes

sameerz added this to the May 2 - May 20 milestone May 13, 2022

res-life mentioned this pull request May 16, 2022

Fix databrick Shim to support Ansi mode when casting from string to date [databricks] #5494

Merged

pxLi closed this May 16, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix test_parquet_check_schema_compatibility [databricks] #5484

Fix test_parquet_check_schema_compatibility [databricks] #5484

sperlingxx commented May 13, 2022

pxLi left a comment

pxLi commented May 13, 2022

pxLi commented May 13, 2022 •

edited

Loading

sperlingxx commented May 13, 2022

sperlingxx commented May 13, 2022

revans2 commented May 13, 2022

sperlingxx commented May 16, 2022

pxLi commented May 16, 2022

Fix test_parquet_check_schema_compatibility [databricks] #5484

Fix test_parquet_check_schema_compatibility [databricks] #5484

Conversation

sperlingxx commented May 13, 2022

pxLi left a comment

Choose a reason for hiding this comment

pxLi commented May 13, 2022

pxLi commented May 13, 2022 • edited Loading

sperlingxx commented May 13, 2022

sperlingxx commented May 13, 2022

revans2 commented May 13, 2022

sperlingxx commented May 16, 2022

pxLi commented May 16, 2022

pxLi commented May 13, 2022 •

edited

Loading