Wrap scalar generation into spark session in integration test #9405

thirtiseven · 2023-10-09T11:11:11Z

When calling f.lit, error message:

pyspark.sql.utils.AnalysisException: decimal can only support precision up to 38

will be reported when spark.sql.legacy.allowNegativeScaleOfDecimal is unset.

This config is in the default config of integration test, but those config will only be set when calling with_spark_session, but calling f.lit can happen before any of them in some edge cases when CI running IT parallel.

This PR add the negative scale config before calling f.lit when generating scalars.

Also fixed the cache_repr of TimestampGen to add the new parameter.

Signed-off-by: Haoyang Li <haoyangl@nvidia.com>

thirtiseven · 2023-10-09T11:22:41Z

build

res-life · 2023-10-09T12:08:00Z

LGTM

revans2 · 2023-10-09T14:15:16Z

What is the exception that actually triggered this? This is essentially the reason that we turned it on by default everywhere.

spark-rapids/integration_tests/src/main/python/spark_session.py

Line 55 in ca79e8d

'spark.sql.legacy.allowNegativeScaleOfDecimal': 'true',

I just want to understand why the default settings are not applying.

thirtiseven · 2023-10-09T15:16:51Z

@revans2 For example, in following case form the issue 9404:

@pytest.mark.parametrize('data_gen', [DecimalGen(34, -5)], ids=idfn)
def test_greatest1(data_gen):
    num_cols = 20
    s1 = gen_scalar(data_gen, force_no_nulls=not isinstance(data_gen, NullGen))
    # we want lots of nulls
    gen = StructGen([('_c' + str(x), data_gen.copy_special_case(None, weight=100.0))
        for x in range(0, num_cols)], nullable=False)
    command_args = [f.col('_c' + str(x)) for x in range(0, num_cols)]
    command_args.append(s1)
    data_type = data_gen.data_type
    assert_gpu_and_cpu_are_equal_collect(
            lambda spark : gen_df(spark, gen).select(
                f.greatest(*command_args)))

If we run this case individually, it will fail because f.lit was called in gen_scalar, but the config was set in assert_gpu_and_cpu_are_equal_collect for the first time.

In the pre-merge CI job, there are 4 xdist agents running in parallel, and the cases are assigned to different agents by round robin. So if this case happens to be assigned to the first one on the list for a particular agent, the CI will fail.

We believe that this is the reason why #9288's pre-merge keeps failing.

jlowe · 2023-10-09T15:28:58Z

Seems to me the issue is that one or more tests is generating data outside of the normal spark session context that sets up the configs properly. (i.e.: move the data generation to within the dataframe callback that is currently a lambda).

Personally, I'm not a fan of the current PR approach where data_gen can silently smash config values and leave them in that smashed state. Can be very surprising behavior and annoying to track down if it bites someone explicitly trying to test without that config setting.

Signed-off-by: Haoyang Li <haoyangl@nvidia.com>

thirtiseven · 2023-10-09T15:50:20Z

@jlowe Ok, I moved the gen_scalars into a spark session.

revans2 · 2023-10-09T16:08:35Z

integration_tests/src/main/python/data_gen.py

+        src = _gen_scalars_common(data_gen, count, seed=seed)
+        data_type = src.data_type
+        return (_mark_as_lit(src.gen(force_no_nulls=force_no_nulls), data_type) for i in range(0, count))
+    return with_cpu_session(lambda spark: gen_scalars_help(data_gen=data_gen, 


Putting this in a cpu_session fixes the current problem, but it adds a new one. If gen_scalars is called from inside a with_*_session it will have other problems. with_spark_session calls reset_spark_session_conf which does more than just reset the conf. It clears out the catalog too with no way to get the original config or catalog back after it exits. That means with_gpu_session -> gen_scalars will result in the query running on the CPU after the gen_scalars.

I see a few ways to properly fix this.

We set spark.sql.legacy.allowNegativeScaleOfDecimal when launching spark and have the test framework throw an exception if it is not set. Then we remove references to it in all of the tests for consistency. Then we file a follow on issue to fix with_spark_session to not allow nesting and to throw an exception if it is nested.

We fix with_spark_session to throw an exception if it is ever nested and do what you are doing today + update the docs for it to be clear that it can never be called from within a with_spark_session

We fix the test to call gen_scalar from within a with_spark_session and add a doc fix for gen_scalar to indicate that negative scale decimals can have problems if called from outside of with_spark_session block. Then we file a follow on issue to fix with_spark_session to not allow nesting and to throw an exception if it is nested.

I personally prefer option 1 but I am fine with option 2 or 3. Talking to @jlowe he really prefers option 3. The main difference between option 3 and option 2 for me really about the amount of code that needs to change. If we just fix the one test and add some docs, that feels like a really small change. If we have to fix nesting/etc that feels a bit larger, but it is something we need to do either way and would mean all tests that use gen_scalar would be good to deal with all decimal values properly.

I'm not a fan of 2. It's again surprising behavior (who would expect it to spawn a Spark session?). I'm fine with either 1 or 3, and even with 1, I still think we should fix the test(s). We should be putting all data generation inside a spark session context of some kind.

Thanks, updated code to option 3.

Now I wrap all scalar generation with a with_cpu_session, no matter if it calls f.lit or uses DecimalGen. Not sure if we only want to move the cases that are possible to fail into Spark sessions.

Follow-on issue: #9412

Signed-off-by: Haoyang Li <haoyangl@nvidia.com>

thirtiseven · 2023-10-10T05:43:53Z

build

Signed-off-by: Haoyang Li <haoyangl@nvidia.com>

thirtiseven · 2023-10-10T07:06:26Z

build

thirtiseven · 2023-10-10T07:54:54Z

build

thirtiseven · 2023-10-10T09:48:00Z

build

jlowe

PR headline should be updated to reflect the new approach.

integration_tests/README.md

integration_tests/src/main/python/conditionals_test.py

integration_tests/src/main/python/ast_test.py

Co-authored-by: Jason Lowe <jlowe@nvidia.com>

Signed-off-by: Haoyang Li <haoyangl@nvidia.com>

thirtiseven · 2023-10-11T01:29:43Z

@jlowe Thanks for review, all done.

thirtiseven · 2023-10-11T01:50:23Z

build

res-life · 2023-10-11T15:34:15Z

integration_tests/src/main/python/collection_ops_test.py

@@ -67,8 +67,8 @@ def test_concat_double_list_with_lit(dg):

 @pytest.mark.parametrize('data_gen', non_nested_array_gens, ids=idfn)
 def test_concat_list_with_lit(data_gen):
-    lit_col1 = f.lit(gen_scalar(data_gen)).cast(data_gen.data_type)
-    lit_col2 = f.lit(gen_scalar(data_gen)).cast(data_gen.data_type)
+    lit_col1 = f.lit(with_cpu_session(lambda spark: gen_scalar(data_gen))).cast(data_gen.data_type)


This PR is intended to put f.lit into with_cpu_session, not only the f.lit in the gen_scala but also the f.lit in other places. Maybe change to the following:

with_cpu_session(lambda spark: f.lit(gen_scalar(data_gen)).cast(data_gen.data_type))

Please check f.lit in other places in this PR.

Good catch, checked and fixed.

Signed-off-by: Haoyang Li <haoyangl@nvidia.com>

thirtiseven · 2023-10-11T15:57:15Z

build

thirtiseven · 2023-10-13T01:12:11Z

Hi @revans2 please take another look thanks.

add negative scale config before calling lit

b9aadf1

Signed-off-by: Haoyang Li <haoyangl@nvidia.com>

warp gen scalars with a spark session

c48f8f2

Signed-off-by: Haoyang Li <haoyangl@nvidia.com>

revans2 requested changes Oct 9, 2023

View reviewed changes

change every use of scalar gen to wrap them into a spark session

ecdb851

Signed-off-by: Haoyang Li <haoyangl@nvidia.com>

clean up

84712e8

Signed-off-by: Haoyang Li <haoyangl@nvidia.com>

thirtiseven mentioned this pull request Oct 10, 2023

[FEA] disallow nested with_spark_session and throw an exception if it is nested in integration test #9412

Open

jlowe reviewed Oct 10, 2023

View reviewed changes

thirtiseven and others added 2 commits October 11, 2023 09:19

Update integration_tests/README.md

808bc25

Co-authored-by: Jason Lowe <jlowe@nvidia.com>

address comments

3c43343

Signed-off-by: Haoyang Li <haoyangl@nvidia.com>

thirtiseven changed the title ~~Add negative scale config before calling f.lit in integration test~~ Warp scalar generation into spark session in integration test Oct 11, 2023

jlowe previously approved these changes Oct 11, 2023

View reviewed changes

res-life reviewed Oct 11, 2023

View reviewed changes

move f.lit into spark session

815f85c

Signed-off-by: Haoyang Li <haoyangl@nvidia.com>

thirtiseven dismissed jlowe’s stale review via 815f85c October 11, 2023 15:55

tgravescs changed the title ~~Warp scalar generation into spark session in integration test~~ Wrap scalar generation into spark session in integration test Oct 11, 2023

thirtiseven requested a review from revans2 October 12, 2023 01:34

thirtiseven self-assigned this Oct 12, 2023

thirtiseven changed the base branch from branch-23.10 to branch-23.12 October 13, 2023 05:14

revans2 approved these changes Oct 18, 2023

View reviewed changes

revans2 merged commit 6334ece into NVIDIA:branch-23.12 Oct 18, 2023
29 checks passed

sameerz added the test Only impacts tests label Oct 23, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Wrap scalar generation into spark session in integration test #9405

Wrap scalar generation into spark session in integration test #9405

thirtiseven commented Oct 9, 2023 •

edited

Loading

thirtiseven commented Oct 9, 2023

res-life commented Oct 9, 2023

revans2 commented Oct 9, 2023

thirtiseven commented Oct 9, 2023

jlowe commented Oct 9, 2023

thirtiseven commented Oct 9, 2023

revans2 Oct 9, 2023 •

edited

Loading

jlowe Oct 9, 2023

thirtiseven Oct 10, 2023 •

edited

Loading

thirtiseven Oct 10, 2023

thirtiseven commented Oct 10, 2023

thirtiseven commented Oct 10, 2023

thirtiseven commented Oct 10, 2023

thirtiseven commented Oct 10, 2023

jlowe left a comment

thirtiseven commented Oct 11, 2023

thirtiseven commented Oct 11, 2023

res-life Oct 11, 2023

res-life Oct 11, 2023

thirtiseven Oct 11, 2023

thirtiseven commented Oct 11, 2023

thirtiseven commented Oct 13, 2023

Wrap scalar generation into spark session in integration test #9405

Wrap scalar generation into spark session in integration test #9405

Conversation

thirtiseven commented Oct 9, 2023 • edited Loading

thirtiseven commented Oct 9, 2023

res-life commented Oct 9, 2023

revans2 commented Oct 9, 2023

thirtiseven commented Oct 9, 2023

jlowe commented Oct 9, 2023

thirtiseven commented Oct 9, 2023

revans2 Oct 9, 2023 • edited Loading

Choose a reason for hiding this comment

jlowe Oct 9, 2023

Choose a reason for hiding this comment

thirtiseven Oct 10, 2023 • edited Loading

Choose a reason for hiding this comment

thirtiseven Oct 10, 2023

Choose a reason for hiding this comment

thirtiseven commented Oct 10, 2023

thirtiseven commented Oct 10, 2023

thirtiseven commented Oct 10, 2023

thirtiseven commented Oct 10, 2023

jlowe left a comment

Choose a reason for hiding this comment

thirtiseven commented Oct 11, 2023

thirtiseven commented Oct 11, 2023

res-life Oct 11, 2023

Choose a reason for hiding this comment

res-life Oct 11, 2023

Choose a reason for hiding this comment

thirtiseven Oct 11, 2023

Choose a reason for hiding this comment

thirtiseven commented Oct 11, 2023

thirtiseven commented Oct 13, 2023

thirtiseven commented Oct 9, 2023 •

edited

Loading

revans2 Oct 9, 2023 •

edited

Loading

thirtiseven Oct 10, 2023 •

edited

Loading