Add support for running multiple queries in BenchmarkRunner #1591

andygrove · 2021-01-26T16:51:08Z

Signed-off-by: Andy Grove andygrove@nvidia.com

When running against Kubernetes, it would be convenient to be able to run multiple queries from the Spark driver to avoid the overhead of cluster setup/teardown per query.

This PR modifies the spark-submit usage of BenchmarkRunner to change the query argument from a single value to a list of queries to run. The output logging is modified to include the query number so that automation can parse the output and capture timing information for each query.

[BENCHMARK RUNNER] [q5] Iteration 0 took 16283 msec.
[BENCHMARK RUNNER] [q5] Saving benchmark report to tpcds-adhoc-1611679475529.json
[BENCHMARK RUNNER] [q38] Iteration 0 took 10809 msec.
[BENCHMARK RUNNER] [q38] Saving benchmark report to tpcds-adhoc-1611679492073.json

There is no change to usage when used from Spark shell.

Signed-off-by: Andy Grove <andygrove@nvidia.com>

nartal1 · 2021-01-26T17:02:20Z

nit: copyright needs to be updated. Rest LGTM.

Signed-off-by: Andy Grove <andygrove@nvidia.com>

andygrove · 2021-01-26T17:11:01Z

build

andygrove · 2021-01-26T18:17:49Z

Test failure is unrelated:

11:09:04  =================================== FAILURES ===================================
11:09:04  �[31m�[1m______ test_single_orderby[Column<b'a ASC NULLS FIRST'>-Float(not_null)] _______�[0m
11:09:04  [gw1] linux -- Python 3.8.7 /usr/bin/python
11:09:04  
11:09:04  data_gen = Float(not_null), order = Column<b'a ASC NULLS FIRST'>
11:09:04  
11:09:04      @pytest.mark.parametrize('data_gen', orderable_gens + orderable_not_null_gen, ids=idfn)
11:09:04      @pytest.mark.parametrize('order', [f.col('a').asc(), f.col('a').asc_nulls_last(), f.col('a').desc(), f.col('a').desc_nulls_first()], ids=idfn)
11:09:04      def test_single_orderby(data_gen, order):
11:09:04  >       assert_gpu_and_cpu_are_equal_collect(
11:09:04                  lambda spark : unary_op_df(spark, data_gen).orderBy(order),
11:09:04                  conf = allow_negative_scale_of_decimal_conf)
11:09:04  
11:09:04  �[1m�[31m../../src/main/python/sort_test.py�[0m:33: 
11:09:04  _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 
11:09:04  �[1m�[31m../../src/main/python/asserts.py�[0m:336: in assert_gpu_and_cpu_are_equal_collect
11:09:04      _assert_gpu_and_cpu_are_equal(func, True, conf=conf)
11:09:04  �[1m�[31m../../src/main/python/asserts.py�[0m:328: in _assert_gpu_and_cpu_are_equal
11:09:04      assert_equal(from_cpu, from_gpu)
11:09:04  �[1m�[31m../../src/main/python/asserts.py�[0m:90: in assert_equal
11:09:04      _assert_equal(cpu, gpu, float_check=get_float_check(), path=[])
11:09:04  �[1m�[31m../../src/main/python/asserts.py�[0m:38: in _assert_equal
11:09:04      _assert_equal(cpu[index], gpu[index], float_check, path + [index])
11:09:04  �[1m�[31m../../src/main/python/asserts.py�[0m:31: in _assert_equal
11:09:04      _assert_equal(cpu[field], gpu[field], float_check, path + [field])
11:09:04  _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 
11:09:04  
11:09:04  cpu = -inf, gpu = nan

andygrove · 2021-01-26T18:18:07Z

build

andygrove · 2021-01-26T19:34:39Z

build

gerashegalov

LGTM

andygrove · 2021-01-26T20:00:49Z

build

andygrove · 2021-01-26T21:46:28Z

Failed with this again:

14:33:35  =================================== FAILURES ===================================
14:33:35  �[31m�[1m______ test_single_orderby[Column<b'a ASC NULLS FIRST'>-Float(not_null)] _______�[0m
14:33:35  [gw1] linux -- Python 3.8.7 /usr/bin/python
14:33:35  
14:33:35  data_gen = Float(not_null), order = Column<b'a ASC NULLS FIRST'>
14:33:35  
14:33:35      @pytest.mark.parametrize('data_gen', orderable_gens + orderable_not_null_gen, ids=idfn)
14:33:35      @pytest.mark.parametrize('order', [f.col('a').asc(), f.col('a').asc_nulls_last(), f.col('a').desc(), f.col('a').desc_nulls_first()], ids=idfn)
14:33:35      def test_single_orderby(data_gen, order):
14:33:35  >       assert_gpu_and_cpu_are_equal_collect(
14:33:35                  lambda spark : unary_op_df(spark, data_gen).orderBy(order),
14:33:35                  conf = allow_negative_scale_of_decimal_conf)
14:33:35  
14:33:35  �[1m�[31m../../src/main/python/sort_test.py�[0m:33: 
14:33:35  _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 
14:33:35  �[1m�[31m../../src/main/python/asserts.py�[0m:336: in assert_gpu_and_cpu_are_equal_collect
14:33:35      _assert_gpu_and_cpu_are_equal(func, True, conf=conf)
14:33:35  �[1m�[31m../../src/main/python/asserts.py�[0m:328: in _assert_gpu_and_cpu_are_equal
14:33:35      assert_equal(from_cpu, from_gpu)
14:33:35  �[1m�[31m../../src/main/python/asserts.py�[0m:90: in assert_equal
14:33:35      _assert_equal(cpu, gpu, float_check=get_float_check(), path=[])
14:33:35  �[1m�[31m../../src/main/python/asserts.py�[0m:38: in _assert_equal
14:33:35      _assert_equal(cpu[index], gpu[index], float_check, path + [index])
14:33:35  �[1m�[31m../../src/main/python/asserts.py�[0m:31: in _assert_equal
14:33:35      _assert_equal(cpu[field], gpu[field], float_check, path + [field])
14:33:35  _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 
14:33:35  
14:33:35  cpu = -inf, gpu = nan

revans2 · 2021-01-26T21:51:46Z

The failure is a known issue #1585 and a fix is being worked on for cudf

jlowe · 2021-01-27T16:14:17Z

build

) * Add support for running multiple queries in BenchmarkRunner Signed-off-by: Andy Grove <andygrove@nvidia.com>

…IA#1591) Signed-off-by: Mike Wilson <knobby@burntsheep.com>

Add support for running multiple queries in BenchmarkRunner

c3daeb6

Signed-off-by: Andy Grove <andygrove@nvidia.com>

andygrove added the benchmark Benchmarking, benchmarking tools label Jan 26, 2021

andygrove self-assigned this Jan 26, 2021

andygrove requested review from abellina and nartal1 January 26, 2021 16:51

andygrove added this to the Jan 18 - Jan 29 milestone Jan 26, 2021

Update copyright years

b808a18

Signed-off-by: Andy Grove <andygrove@nvidia.com>

nartal1 approved these changes Jan 26, 2021

View reviewed changes

Merge branch 'branch-0.4' into benchmark-multi-query

e5615f7

gerashegalov approved these changes Jan 26, 2021

View reviewed changes

nartal1 merged commit 2a56f7f into NVIDIA:branch-0.4 Jan 27, 2021

andygrove deleted the benchmark-multi-query branch February 11, 2021 22:16

nartal1 pushed a commit to nartal1/spark-rapids that referenced this pull request Jun 9, 2021

Add support for running multiple queries in BenchmarkRunner (NVIDIA#1591

68305d0

) * Add support for running multiple queries in BenchmarkRunner Signed-off-by: Andy Grove <andygrove@nvidia.com>

nartal1 pushed a commit to nartal1/spark-rapids that referenced this pull request Jun 9, 2021

Add support for running multiple queries in BenchmarkRunner (NVIDIA#1591

ade6d1e

) * Add support for running multiple queries in BenchmarkRunner Signed-off-by: Andy Grove <andygrove@nvidia.com>

tgravescs pushed a commit to tgravescs/spark-rapids that referenced this pull request Nov 30, 2023

Fixing potential overflow in array read in string to float cast (NVID…

9858813

…IA#1591) Signed-off-by: Mike Wilson <knobby@burntsheep.com>

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add support for running multiple queries in BenchmarkRunner #1591

Add support for running multiple queries in BenchmarkRunner #1591

andygrove commented Jan 26, 2021

nartal1 commented Jan 26, 2021 •

edited

Loading

andygrove commented Jan 26, 2021

andygrove commented Jan 26, 2021

andygrove commented Jan 26, 2021

andygrove commented Jan 26, 2021

gerashegalov left a comment

andygrove commented Jan 26, 2021

andygrove commented Jan 26, 2021

revans2 commented Jan 26, 2021

jlowe commented Jan 27, 2021

Add support for running multiple queries in BenchmarkRunner #1591

Add support for running multiple queries in BenchmarkRunner #1591

Conversation

andygrove commented Jan 26, 2021

nartal1 commented Jan 26, 2021 • edited Loading

andygrove commented Jan 26, 2021

andygrove commented Jan 26, 2021

andygrove commented Jan 26, 2021

andygrove commented Jan 26, 2021

gerashegalov left a comment

Choose a reason for hiding this comment

andygrove commented Jan 26, 2021

andygrove commented Jan 26, 2021

revans2 commented Jan 26, 2021

jlowe commented Jan 27, 2021

nartal1 commented Jan 26, 2021 •

edited

Loading