Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[BUG] regex_test failed in nightly #6127

Closed
pxLi opened this issue Jul 27, 2022 · 1 comment
Closed

[BUG] regex_test failed in nightly #6127

pxLi opened this issue Jul 27, 2022 · 1 comment
Labels
bug Something isn't working test Only impacts tests

Comments

@pxLi
Copy link
Collaborator

pxLi commented Jul 27, 2022

Describe the bug

failed cases,

20:41:11  =========================== short test summary info ============================
20:41:11  FAILED ../../src/main/python/regexp_test.py::test_split_re_negative_limit - p...
20:41:11  FAILED ../../src/main/python/regexp_test.py::test_split_re_zero_limit - pyspa...
20:41:11  FAILED ../../src/main/python/regexp_test.py::test_split_re_one_limit - pyspar...
20:41:11  FAILED ../../src/main/python/regexp_test.py::test_split_re_positive_limit - p...
20:41:11  FAILED ../../src/main/python/regexp_test.py::test_split_re_no_limit - pyspark...
20:41:11  FAILED ../../src/main/python/regexp_test.py::test_re_replace - pyspark.sql.ut...
20:41:11  FAILED ../../src/main/python/regexp_test.py::test_re_replace_repetition - pys...
20:41:11  FAILED ../../src/main/python/regexp_test.py::test_re_replace_backrefs - pyspa...
20:41:11  FAILED ../../src/main/python/regexp_test.py::test_re_replace_anchors - pyspar...
20:41:11  FAILED ../../src/main/python/regexp_test.py::test_re_replace_backrefs_idx_out_of_bounds
20:41:11  FAILED ../../src/main/python/regexp_test.py::test_re_replace_backrefs_escaped
20:41:11  FAILED ../../src/main/python/regexp_test.py::test_re_replace_escaped - pyspar...
20:41:11  FAILED ../../src/main/python/regexp_test.py::test_re_replace_null - pyspark.s...
20:41:11  FAILED ../../src/main/python/regexp_test.py::test_regexp_replace - pyspark.sq...
20:41:11  FAILED ../../src/main/python/regexp_test.py::test_regexp_replace_character_set_negated
20:41:11  FAILED ../../src/main/python/regexp_test.py::test_regexp_extract - pyspark.sq...
20:41:11  FAILED ../../src/main/python/regexp_test.py::test_regexp_extract_no_match - p...
20:41:11  FAILED ../../src/main/python/regexp_test.py::test_regexp_extract_multiline - ...
20:41:11  FAILED ../../src/main/python/regexp_test.py::test_regexp_extract_multiline_negated_character_class
20:41:11  FAILED ../../src/main/python/regexp_test.py::test_regexp_extract_idx_0 - pysp...
20:41:11  FAILED ../../src/main/python/regexp_test.py::test_word_boundaries - pyspark.s...
20:41:11  FAILED ../../src/main/python/regexp_test.py::test_character_classes - pyspark...
20:41:11  FAILED ../../src/main/python/regexp_test.py::test_regexp_hexadecimal_digits
20:41:11  FAILED ../../src/main/python/regexp_test.py::test_regexp_whitespace - pyspark...
20:41:11  FAILED ../../src/main/python/regexp_test.py::test_regexp_horizontal_vertical_whitespace
20:41:11  FAILED ../../src/main/python/regexp_test.py::test_regexp_linebreak - pyspark....
20:41:11  FAILED ../../src/main/python/regexp_test.py::test_regexp_octal_digits - pyspa...
20:41:11  FAILED ../../src/main/python/regexp_test.py::test_regexp_replace_digit - pysp...
20:41:11  FAILED ../../src/main/python/regexp_test.py::test_regexp_replace_word - pyspa...
20:41:11  FAILED ../../src/main/python/regexp_test.py::test_predefined_character_classes
20:41:11  FAILED ../../src/main/python/regexp_test.py::test_rlike - pyspark.sql.utils.I...
20:41:11  FAILED ../../src/main/python/regexp_test.py::test_rlike_embedded_null - pyspa...
20:41:11  FAILED ../../src/main/python/regexp_test.py::test_rlike_escape - pyspark.sql....
20:41:11  FAILED ../../src/main/python/regexp_test.py::test_rlike_multi_line - pyspark....
20:41:11  FAILED ../../src/main/python/regexp_test.py::test_rlike_missing_escape - pysp...
20:41:11  FAILED ../../src/main/python/regexp_test.py::test_regexp_extract_all_idx_zero
20:41:11  FAILED ../../src/main/python/regexp_test.py::test_regexp_extract_all_idx_positive
20:41:11  FAILED ../../src/main/python/regexp_test.py::test_rlike_unicode_support - pys...
20:41:11  FAILED ../../src/main/python/regexp_test.py::test_regexp_replace_unicode_support
20:41:11  FAILED ../../src/main/python/regexp_test.py::test_regexp_split_unicode_support

most cases failed
pyspark.sql.utils.IllegalArgumentException: Part of the plan is not columnar class org.apache.spark.sql.execution.ProjectExec

details log,

[2022-07-21T12:46:03.625Z] �[31m�[1m____________________________ test_regexp_whitespace ____________________________�[0m
[2022-07-21T12:46:03.625Z] 
[2022-07-21T12:46:03.625Z]     def test_regexp_whitespace():
[2022-07-21T12:46:03.625Z]         gen = mk_str_gen('\u001e[abcd]\t\n{1,3} [0-9]\n {1,3}\x0b\t[abcd]\r\f[0-9]{0,10}')
[2022-07-21T12:46:03.625Z] >       assert_gpu_and_cpu_are_equal_collect(
[2022-07-21T12:46:03.625Z]                 lambda spark: unary_op_df(spark, gen).selectExpr(
[2022-07-21T12:46:03.625Z]                     'rlike(a, "\\\\s")',
[2022-07-21T12:46:03.625Z]                     'rlike(a, "\\\\s{3}")',
[2022-07-21T12:46:03.625Z]                     'rlike(a, "[abcd]+\\\\s+[0-9]+")',
[2022-07-21T12:46:03.625Z]                     'rlike(a, "\\\\S{3}")',
[2022-07-21T12:46:03.625Z]                     'rlike(a, "[abcd]+\\\\s+\\\\S{2,3}")',
[2022-07-21T12:46:03.625Z]                     'regexp_extract(a, "([a-d]+)(\\\\s[0-9]+)([a-d]+)", 2)',
[2022-07-21T12:46:03.625Z]                     'regexp_extract(a, "([a-d]+)(\\\\S+)([0-9]+)", 2)',
[2022-07-21T12:46:03.625Z]                     'regexp_extract(a, "([a-d]+)(\\\\S+)([0-9]+)", 3)',
[2022-07-21T12:46:03.625Z]                     'regexp_replace(a, "(\\\\s+)", "@")',
[2022-07-21T12:46:03.625Z]                     'regexp_replace(a, "(\\\\S+)", "#")',
[2022-07-21T12:46:03.625Z]                 ),
[2022-07-21T12:46:03.625Z]             conf=_regexp_conf)
[2022-07-21T12:46:03.625Z] 
[2022-07-21T12:46:03.625Z] �[1m�[31m../../src/main/python/regexp_test.py�[0m:489: 
[2022-07-21T12:46:03.625Z] _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 
[2022-07-21T12:46:03.625Z] �[1m�[31m../../src/main/python/asserts.py�[0m:508: in assert_gpu_and_cpu_are_equal_collect
[2022-07-21T12:46:03.625Z]     _assert_gpu_and_cpu_are_equal(func, 'COLLECT', conf=conf, is_cpu_first=is_cpu_first)
[2022-07-21T12:46:03.625Z] �[1m�[31m../../src/main/python/asserts.py�[0m:428: in _assert_gpu_and_cpu_are_equal
[2022-07-21T12:46:03.625Z]     run_on_gpu()
[2022-07-21T12:46:03.625Z] �[1m�[31m../../src/main/python/asserts.py�[0m:422: in run_on_gpu
[2022-07-21T12:46:03.625Z]     from_gpu = with_gpu_session(bring_back, conf=conf)
[2022-07-21T12:46:03.625Z] �[1m�[31m../../src/main/python/spark_session.py�[0m:132: in with_gpu_session
[2022-07-21T12:46:03.625Z]     return with_spark_session(func, conf=copy)
[2022-07-21T12:46:03.625Z] �[1m�[31m../../src/main/python/spark_session.py�[0m:99: in with_spark_session
[2022-07-21T12:46:03.625Z]     ret = func(_spark)
[2022-07-21T12:46:03.625Z] �[1m�[31m../../src/main/python/asserts.py�[0m:201: in <lambda>
[2022-07-21T12:46:03.625Z]     bring_back = lambda spark: limit_func(spark).collect()
[2022-07-21T12:46:03.625Z] �[1m�[31m/home/jenkins/agent/workspace/jenkins-rapids_integration-dev-github-495-311/jars/spark-3.1.1-bin-hadoop3.2/python/lib/pyspark.zip/pyspark/sql/dataframe.py�[0m:677: in collect
[2022-07-21T12:46:03.625Z]     sock_info = self._jdf.collectToPython()
[2022-07-21T12:46:03.625Z] �[1m�[31m/home/jenkins/agent/workspace/jenkins-rapids_integration-dev-github-495-311/jars/spark-3.1.1-bin-hadoop3.2/python/lib/py4j-0.10.9-src.zip/py4j/java_gateway.py�[0m:1304: in __call__
[2022-07-21T12:46:03.625Z]     return_value = get_return_value(
[2022-07-21T12:46:03.625Z] _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 
[2022-07-21T12:46:03.625Z] 
[2022-07-21T12:46:03.625Z] a = ('xro6355', <py4j.java_gateway.GatewayClient object at 0x7f63cd4fcf70>, 'o6354', 'collectToPython')
[2022-07-21T12:46:03.625Z] kw = {}
[2022-07-21T12:46:03.625Z] converted = IllegalArgumentException('Part of the plan is not columnar class org.apache.spark.sql.execution.ProjectExec\nProject [...:79)\n\tat py4j.GatewayConnection.run(GatewayConnection.java:238)\n\tat java.lang.Thread.run(Thread.java:748)\n', None)
[2022-07-21T12:46:03.625Z] 
[2022-07-21T12:46:03.625Z]     def deco(*a, **kw):
[2022-07-21T12:46:03.625Z]         try:
[2022-07-21T12:46:03.625Z]             return f(*a, **kw)
[2022-07-21T12:46:03.625Z]         except py4j.protocol.Py4JJavaError as e:
[2022-07-21T12:46:03.625Z]             converted = convert_exception(e.java_exception)
[2022-07-21T12:46:03.625Z]             if not isinstance(converted, UnknownException):
[2022-07-21T12:46:03.625Z]                 # Hide where the exception came from that shows a non-Pythonic
[2022-07-21T12:46:03.625Z]                 # JVM exception message.
[2022-07-21T12:46:03.625Z] >               raise converted from None
[2022-07-21T12:46:03.625Z] �[1m�[31mE               pyspark.sql.utils.IllegalArgumentException: Part of the plan is not columnar class org.apache.spark.sql.execution.ProjectExec�[0m
[2022-07-21T12:46:03.625Z] �[1m�[31mE               Project [a#879 RLIKE \s AS a RLIKE \s#881, a#879 RLIKE \s{3} AS a RLIKE \s{3}#882, a#879 RLIKE [abcd]+\s+[0-9]+ AS a RLIKE [abcd]+\s+[0-9]+#883, a#879 RLIKE \S{3} AS a RLIKE \S{3}#884, a#879 RLIKE [abcd]+\s+\S{2,3} AS a RLIKE [abcd]+\s+\S{2,3}#885, regexp_extract(a#879, ([a-d]+)(\s[0-9]+)([a-d]+), 2) AS regexp_extract(a, ([a-d]+)(\s[0-9]+)([a-d]+), 2)#886, regexp_extract(a#879, ([a-d]+)(\S+)([0-9]+), 2) AS regexp_extract(a, ([a-d]+)(\S+)([0-9]+), 2)#887, regexp_extract(a#879, ([a-d]+)(\S+)([0-9]+), 3) AS regexp_extract(a, ([a-d]+)(\S+)([0-9]+), 3)#888, regexp_replace(a#879, (\s+), @, 1) AS regexp_replace(a, (\s+), @, 1)#889, regexp_replace(a#879, (\S+), #, 1) AS regexp_replace(a, (\S+), #, 1)#890]�[0m
[2022-07-21T12:46:03.625Z] �[1m�[31mE               +- Scan ExistingRDD[a#879]�[0m
[2022-07-21T12:46:03.625Z] 
[2022-07-21T12:46:03.625Z] �[1m�[31m/home/jenkins/agent/workspace/jenkins-rapids_integration-dev-github-495-311/jars/spark-3.1.1-bin-hadoop3.2/python/lib/pyspark.zip/pyspark/sql/utils.py�[0m:117: IllegalArgumentException
[2022-07-21T12:46:03.625Z] ----------------------------- Captured stdout call -----------------------------
[2022-07-21T12:46:03.625Z] ### CPU RUN ###
[2022-07-21T12:46:03.625Z] ### GPU RUN ###
[2022-07-21T12:46:03.625Z] �[31m�[1m__________________ test_regexp_horizontal_vertical_whitespace __________________�[0m
[2022-07-21T12:46:03.626Z] 
[2022-07-21T12:46:03.626Z]     def test_regexp_horizontal_vertical_whitespace():
[2022-07-21T12:46:03.626Z]         gen = mk_str_gen(
[2022-07-21T12:46:03.626Z]             '''\xA0\u1680\u180e[abcd]\t\n{1,3} [0-9]\n {1,3}\x0b\t[abcd]\r\f[0-9]{0,10}
[2022-07-21T12:46:03.626Z]                 [\u2001-\u200a]{1,3}\u202f\u205f\u3000\x85\u2028\u2029
[2022-07-21T12:46:03.626Z]             ''')
[2022-07-21T12:46:03.626Z] >       assert_gpu_and_cpu_are_equal_collect(
[2022-07-21T12:46:03.626Z]                 lambda spark: unary_op_df(spark, gen).selectExpr(
[2022-07-21T12:46:03.626Z]                     'rlike(a, "\\\\h{2}")',
[2022-07-21T12:46:03.626Z]                     'rlike(a, "\\\\v{3}")',
[2022-07-21T12:46:03.626Z]                     'rlike(a, "[abcd]+\\\\h+[0-9]+")',
[2022-07-21T12:46:03.626Z]                     'rlike(a, "[abcd]+\\\\v+[0-9]+")',
[2022-07-21T12:46:03.626Z]                     'rlike(a, "\\\\H")',
[2022-07-21T12:46:03.626Z]                     'rlike(a, "\\\\V")',
[2022-07-21T12:46:03.626Z]                     'rlike(a, "[abcd]+\\\\h+\\\\V{2,3}")',
[2022-07-21T12:46:03.626Z]                     'regexp_extract(a, "([a-d]+)([0-9]+\\\\v)([a-d]+)", 2)',
[2022-07-21T12:46:03.626Z]                     'regexp_extract(a, "([a-d]+)(\\\\H+)([0-9]+)", 2)',
[2022-07-21T12:46:03.626Z]                     'regexp_extract(a, "([a-d]+)(\\\\V+)([0-9]+)", 3)',
[2022-07-21T12:46:03.626Z]                     'regexp_replace(a, "(\\\\v+)", "@")',
[2022-07-21T12:46:03.626Z]                     'regexp_replace(a, "(\\\\H+)", "#")',
[2022-07-21T12:46:03.626Z]                 ),
[2022-07-21T12:46:03.626Z]             conf=_regexp_conf)
[2022-07-21T12:46:03.626Z] 
[2022-07-21T12:46:03.626Z] �[1m�[31m../../src/main/python/regexp_test.py�[0m:509: 
[2022-07-21T12:46:03.626Z] _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 
[2022-07-21T12:46:03.626Z] �[1m�[31m../../src/main/python/asserts.py�[0m:508: in assert_gpu_and_cpu_are_equal_collect
[2022-07-21T12:46:03.626Z]     _assert_gpu_and_cpu_are_equal(func, 'COLLECT', conf=conf, is_cpu_first=is_cpu_first)
[2022-07-21T12:46:03.626Z] �[1m�[31m../../src/main/python/asserts.py�[0m:428: in _assert_gpu_and_cpu_are_equal
[2022-07-21T12:46:03.626Z]     run_on_gpu()
[2022-07-21T12:46:03.626Z] �[1m�[31m../../src/main/python/asserts.py�[0m:422: in run_on_gpu
[2022-07-21T12:46:03.626Z]     from_gpu = with_gpu_session(bring_back, conf=conf)
[2022-07-21T12:46:03.626Z] �[1m�[31m../../src/main/python/spark_session.py�[0m:132: in with_gpu_session
[2022-07-21T12:46:03.626Z]     return with_spark_session(func, conf=copy)
[2022-07-21T12:46:03.626Z] �[1m�[31m../../src/main/python/spark_session.py�[0m:99: in with_spark_session
[2022-07-21T12:46:03.626Z]     ret = func(_spark)
[2022-07-21T12:46:03.626Z] �[1m�[31m../../src/main/python/asserts.py�[0m:201: in <lambda>
[2022-07-21T12:46:03.626Z]     bring_back = lambda spark: limit_func(spark).collect()
[2022-07-21T12:46:03.626Z] �[1m�[31m/home/jenkins/agent/workspace/jenkins-rapids_integration-dev-github-495-311/jars/spark-3.1.1-bin-hadoop3.2/python/lib/pyspark.zip/pyspark/sql/dataframe.py�[0m:677: in collect
[2022-07-21T12:46:03.626Z]     sock_info = self._jdf.collectToPython()
[2022-07-21T12:46:03.626Z] �[1m�[31m/home/jenkins/agent/workspace/jenkins-rapids_integration-dev-github-495-311/jars/spark-3.1.1-bin-hadoop3.2/python/lib/py4j-0.10.9-src.zip/py4j/java_gateway.py�[0m:1304: in __call__

related to #6041

@pxLi pxLi added bug Something isn't working test Only impacts tests labels Jul 27, 2022
@pxLi
Copy link
Collaborator Author

pxLi commented Jul 27, 2022

seems there is still some locale check issue
my mistake, this is unrelated issue

@pxLi pxLi closed this as completed Jul 27, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working test Only impacts tests
Projects
None yet
Development

No branches or pull requests

1 participant