[BUG] qa_nightly_select_test.py::test_select FAILED on the Dataproc Cluster #2230

NvTimLiu · 2021-04-22T12:41:06Z

Describe the bug
=================================== FAILURES ===================================
_______________ test_select[REGEXP_REPLACE(strF, 'Yu', 'Eric')] ________________

sql_query_line = ("SELECT REGEXP_REPLACE(strF, 'Yu', 'Eric') FROM test_table", "REGEXP_REPLACE(strF, 'Yu', 'Eric')")
pytestconfig = <_pytest.config.Config object at 0x7efca578e7c0>

  @approximate_float
  @incompat
  @qarun
  @pytest.mark.parametrize(sql_query_line', SELECT_SQL, ids=idfn)
  def test_select(sql_query_line, pytestconfig):
      sql_query = sql_query_line[0]
      if sql_query:
          print(sql_query)
          with_cpu_session(num_stringDf)

      assert_gpu_and_cpu_are_equal_collect(lambda spark: spark.sql(sql_query), conf=_qa_conf)

integration_tests/src/main/python/qa_nightly_select_test.py:167:

integration_tests/src/main/python/asserts.py:360: in assert_gpu_and_cpu_are_equal_collect
_assert_gpu_and_cpu_are_equal(func, COLLECT', conf=conf, is_cpu_first=is_cpu_first)
integration_tests/src/main/python/asserts.py:341: in _assert_gpu_and_cpu_are_equal
run_on_gpu()
integration_tests/src/main/python/asserts.py:334: in run_on_gpu
from_gpu = with_gpu_session(bring_back,
integration_tests/src/main/python/spark_session.py:95: in with_gpu_session
return with_spark_session(func, conf=copy)
integration_tests/src/main/python/spark_session.py:68: in with_spark_session
ret = func(_spark)
integration_tests/src/main/python/asserts.py:179: in
bring_back = lambda spark: limit_func(spark).collect()
/hadoop/yarn/nm-local-dir/usercache/root/appcache/application_1619083825114_0001/container_e01_1619083825114_0001_01_000001/pyspark.zip/pyspark/sql/dataframe.py:677: in collect
sock_info = self._jdf.collectToPython()
/hadoop/yarn/nm-local-dir/usercache/root/appcache/application_1619083825114_0001/container_e01_1619083825114_0001_01_000001/py4j-0.10.9-src.zip/py4j/java_gateway.py:1304: in call
return_value = get_return_value(
/hadoop/yarn/nm-local-dir/usercache/root/appcache/application_1619083825114_0001/container_e01_1619083825114_0001_01_000001/pyspark.zip/pyspark/sql/utils.py:111: in deco
return f(*a, **kw)

answer = 'xro1134345'
gateway_client = <py4j.java_gateway.GatewayClient object at 0x7efca53d6040>
target_id = 'o1134344', name = 'collectToPython'

  def get_return_value(answer, gateway_client, target_id=None, name=None):
    """Converts an answer received from the Java gateway into a Python object.

For example, string representation of integers are converted to Python
integer, string representation of objects are converted to JavaObject
instances, etc.

:param answer: the string returned by the Java gateway
:param gateway_client: the gateway client used to communicate with the Java
    Gateway. Only necessary if the answer is a reference (e.g., object,
    list, map)
:param target_id: the name of the object from which the answer comes from
    (e.g., *object1* in `object1.hello()`). Optional.
:param name: the name of the member from which the answer comes from
    (e.g., *hello* in `object1.hello()`). Optional.
"""
      if is_error(answer)[0]:
          if len(answer) > 1:
              type = answer[1]
              value = OUTPUT_CONVERTER[type](answer[2:], gateway_client)
              if answer[1] == REFERENCE_TYPE:
               raise Py4JJavaError(
                      "An error occurred while calling {0}{1}{2}.\n".
                      format(target_id, ".", name), value)
              py4j.protocol.Py4JJavaError: An error occurred while calling o1134344.collectToPython.
              : scala.MatchError: List(strF#496130, Yu, Eric, 1) (of class scala.collection.immutable.$colon$colon)
              	at com.nvidia.spark.rapids.shims.spark311.Spark311Shims$$anon$4.convertToGpu(Spark311Shims.scala:194)
              	at com.nvidia.spark.rapids.shims.spark311.Spark311Shims$$anon$4.convertToGpu(Spark311Shims.scala:180)
              	at com.nvidia.spark.rapids.UnaryExprMeta.convertToGpu(RapidsMeta.scala:805)
              	at com.nvidia.spark.rapids.UnaryExprMeta.convertToGpu(RapidsMeta.scala:797)
              	at com.nvidia.spark.rapids.GpuOverrides$$anon$152.$anonfun$convertToGpu$19(GpuOverrides.scala:2686)
              	at scala.collection.immutable.Stream.map(Stream.scala:418)
              	at com.nvidia.spark.rapids.GpuOverrides$$anon$152.convertToGpu(GpuOverrides.scala:2686)
              	at com.nvidia.spark.rapids.GpuOverrides$$anon$152.convertToGpu(GpuOverrides.scala:2683)
              	at com.nvidia.spark.rapids.SparkPlanMeta.convertIfNeeded(RapidsMeta.scala:642)
              	at com.nvidia.spark.rapids.GpuOverrides.applyOverrides(GpuOverrides.scala:3050)
              	at com.nvidia.spark.rapids.GpuOverrides.apply(GpuOverrides.scala:3012)
              	at com.nvidia.spark.rapids.GpuOverrides.apply(GpuOverrides.scala:2998)
              	at org.apache.spark.sql.execution.ApplyColumnarRulesAndInsertTransitions.$anonfun$apply$1(Columnar.scala:532)
              	at org.apache.spark.sql.execution.ApplyColumnarRulesAndInsertTransitions.$anonfun$apply$1$adapted(Columnar.scala:531)
              	at scala.collection.mutable.ResizableArray.foreach(ResizableArray.scala:62)
              	at scala.collection.mutable.ResizableArray.foreach$(ResizableArray.scala:55)
              	at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:49)
              	at org.apache.spark.sql.execution.ApplyColumnarRulesAndInsertTransitions.apply(Columnar.scala:531)
              	at org.apache.spark.sql.execution.ApplyColumnarRulesAndInsertTransitions.apply(Columnar.scala:495)
              	at org.apache.spark.sql.execution.QueryExecution$.$anonfun$prepareForExecution$1(QueryExecution.scala:372)
              	at scala.collection.LinearSeqOptimized.foldLeft(LinearSeqOptimized.scala:126)
              	at scala.collection.LinearSeqOptimized.foldLeft$(LinearSeqOptimized.scala:122)
              	at scala.collection.immutable.List.foldLeft(List.scala:91)
              	at org.apache.spark.sql.execution.QueryExecution$.prepareForExecution(QueryExecution.scala:371)
              	at org.apache.spark.sql.execution.QueryExecution.$anonfun$executedPlan$1(QueryExecution.scala:117)
              	at org.apache.spark.sql.catalyst.QueryPlanningTracker.measurePhase(QueryPlanningTracker.scala:111)
              	at org.apache.spark.sql.execution.QueryExecution.$anonfun$executePhase$1(QueryExecution.scala:143)
              	at org.apache.spark.sql.SparkSession.withActive(SparkSession.scala:772)
              	at org.apache.spark.sql.execution.QueryExecution.executePhase(QueryExecution.scala:143)
              	at org.apache.spark.sql.execution.QueryExecution.executedPlan$lzycompute(QueryExecution.scala:117)
              	at org.apache.spark.sql.execution.QueryExecution.executedPlan(QueryExecution.scala:110)
              	at org.apache.spark.sql.execution.QueryExecution.$anonfun$simpleString$2(QueryExecution.scala:161)
              	at org.apache.spark.sql.execution.ExplainUtils$.processPlan(ExplainUtils.scala:115)
              	at org.apache.spark.sql.execution.QueryExecution.simpleString(QueryExecution.scala:161)
              	at org.apache.spark.sql.execution.QueryExecution.org$apache$spark$sql$execution$QueryExecution$$explainString(QueryExecution.scala:206)
              	at org.apache.spark.sql.execution.QueryExecution.explainString(QueryExecution.scala:175)
              	at org.apache.spark.sql.execution.SQLExecution$.$anonfun$withNewExecutionId$5(SQLExecution.scala:98)
              	at org.apache.spark.sql.execution.SQLExecution$.withSQLConfPropagated(SQLExecution.scala:163)
              	at org.apache.spark.sql.execution.SQLExecution$.$anonfun$withNewExecutionId$1(SQLExecution.scala:90)
              	at org.apache.spark.sql.SparkSession.withActive(SparkSession.scala:772)
              	at org.apache.spark.sql.execution.SQLExecution$.withNewExecutionId(SQLExecution.scala:64)
              	at org.apache.spark.sql.Dataset.withAction(Dataset.scala:3685)
              	at org.apache.spark.sql.Dataset.collectToPython(Dataset.scala:3516)
              	at sun.reflect.GeneratedMethodAccessor77.invoke(Unknown Source)
              	at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
              	at java.lang.reflect.Method.invoke(Method.java:498)
              	at py4j.reflection.MethodInvoker.invoke(MethodInvoker.java:244)
              	at py4j.reflection.ReflectionEngine.invoke(ReflectionEngine.java:357)
              	at py4j.Gateway.invoke(Gateway.java:282)
              	at py4j.commands.AbstractCommand.invokeMethod(AbstractCommand.java:132)
              	at py4j.commands.CallCommand.execute(CallCommand.java:79)
              	at py4j.GatewayConnection.run(GatewayConnection.java:238)
              	at java.lang.Thread.run(Thread.java:748)

/hadoop/yarn/nm-local-dir/usercache/root/appcache/application_1619083825114_0001/container_e01_1619083825114_0001_01_000001/py4j-0.10.9-src.zip/py4j/protocol.py:326: Py4JJavaError

Steps/Code to reproduce bug
Run pytests on Dataproc cluster with the script https://github.com/NVIDIA/spark-rapids/blob/branch-0.5/integration_tests/run_pyspark_from_build.sh#L129~L131

Expected behavior
All the python tests PASS

Environment details (please complete the following information)

Environment location: [Dataproc cluster]

The text was updated successfully, but these errors were encountered:

NvTimLiu · 2021-04-22T14:03:04Z

This should be fixed by #2218.

jlowe · 2021-04-22T14:23:41Z

Closing as a duplicate of #2217

NvTimLiu added bug Something isn't working ? - Needs Triage Need team to review and classify labels Apr 22, 2021

NvTimLiu changed the title ~~[BUG] qa_nightly_select_test.py::test_select FAILED on the EMR Cluster~~ [BUG] qa_nightly_select_test.py::test_select FAILED on the Dataproc Cluster Apr 22, 2021

jlowe closed this as completed Apr 22, 2021

jlowe removed the ? - Needs Triage Need team to review and classify label Apr 22, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[BUG] qa_nightly_select_test.py::test_select FAILED on the Dataproc Cluster #2230

[BUG] qa_nightly_select_test.py::test_select FAILED on the Dataproc Cluster #2230

NvTimLiu commented Apr 22, 2021

NvTimLiu commented Apr 22, 2021

jlowe commented Apr 22, 2021

[BUG] qa_nightly_select_test.py::test_select FAILED on the Dataproc Cluster #2230

[BUG] qa_nightly_select_test.py::test_select FAILED on the Dataproc Cluster #2230

Comments

NvTimLiu commented Apr 22, 2021

NvTimLiu commented Apr 22, 2021

jlowe commented Apr 22, 2021