Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[SPARK-26148][PYTHON][TESTS] Increases default parallelism in PySpark tests to speed up #23111

Closed
wants to merge 5 commits into from

Conversation

HyukjinKwon
Copy link
Member

@HyukjinKwon HyukjinKwon commented Nov 22, 2018

What changes were proposed in this pull request?

This PR proposes to increase parallelism in PySpark tests to speed up from 4 to 8.

It decreases the elapsed time from

https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/99163/consoleFull
Tests passed in 1770 seconds

to

https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/99186/testReport/
Tests passed in 1027 seconds

How was this patch tested?

Jenkins tests

@SparkQA
Copy link

SparkQA commented Nov 22, 2018

Test build #99149 has finished for PR 23111 at commit ec0f730.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@HyukjinKwon
Copy link
Member Author

retest this please

@SparkQA
Copy link

SparkQA commented Nov 22, 2018

Test build #99151 has finished for PR 23111 at commit ec0f730.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@HyukjinKwon
Copy link
Member Author

retest this please

@SparkQA
Copy link

SparkQA commented Nov 22, 2018

Test build #99156 has finished for PR 23111 at commit ec0f730.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA

This comment has been minimized.

@SparkQA
Copy link

SparkQA commented Nov 22, 2018

Test build #99159 has finished for PR 23111 at commit d3db950.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@HyukjinKwon
Copy link
Member Author

retest this please

@SparkQA
Copy link

SparkQA commented Nov 22, 2018

Test build #99163 has finished for PR 23111 at commit d3db950.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Nov 22, 2018

Test build #99164 has finished for PR 23111 at commit 973f6da.

  • This patch fails due to an unknown error code, -9.
  • This patch merges cleanly.
  • This patch adds no public classes.

@HyukjinKwon
Copy link
Member Author

retest this please

@SparkQA
Copy link

SparkQA commented Nov 22, 2018

Test build #99172 has finished for PR 23111 at commit 973f6da.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@HyukjinKwon
Copy link
Member Author

retest this please

@HyukjinKwon
Copy link
Member Author

Oh? it drastically decreases from, for instance,

Tests passed in 1770 seconds

to

Tests passed in 1171 seconds

@HyukjinKwon HyukjinKwon changed the title [DO-NOT-MERGE] Increases default parallelism in PySpark tests [SPARK-26148][PYTHON][TESTS] Increases default parallelism in PySpark tests to speed up Nov 22, 2018
@HyukjinKwon
Copy link
Member Author

cc @rxin, @BryanCutler, @squito. This decreases elapsed time (even faster then before splitting the tests).

@SparkQA
Copy link

SparkQA commented Nov 22, 2018

Test build #99187 has finished for PR 23111 at commit 3cb6d0f.

  • This patch fails Spark unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Nov 22, 2018

Test build #99186 has finished for PR 23111 at commit 973f6da.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@HyukjinKwon
Copy link
Member Author

Yea, the improvement looks persistent:

Tests passed in 1027 seconds

@HyukjinKwon
Copy link
Member Author

Hey all, I will merge this in few days if there's no more comments. It's going to speed up the tests roughly 12 ~ 15 mins.

@HyukjinKwon
Copy link
Member Author

Merged to master.

@asfgit asfgit closed this in 41d5aae Nov 25, 2018
@squito
Copy link
Contributor

squito commented Nov 26, 2018

wow, thats great! glad there is a big speedup.

@squito
Copy link
Contributor

squito commented Nov 26, 2018

we might need to be careful that this doesn't un-intentionally overload the jenkins workers so that we end up hitting more timeouts from too many things running concurrently (I dunno how isolated the workers are)

@HyukjinKwon
Copy link
Member Author

HyukjinKwon commented Nov 26, 2018

Ah, it just increases number of python threads that run each PySpark only suit. Shouldn't be a big deal. I'm keeping my eyes on the build.

asfgit pushed a commit that referenced this pull request Dec 6, 2018
…essionWithSGDTests.test_training_and_prediction test

## What changes were proposed in this pull request?

Looks this test is flaky

https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/99704/console
https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/99569/console
https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/99644/console
https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/99548/console
https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/99454/console
https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/99609/console

```
======================================================================
FAIL: test_training_and_prediction (pyspark.mllib.tests.test_streaming_algorithms.StreamingLogisticRegressionWithSGDTests)
Test that the model improves on toy data with no. of batches
----------------------------------------------------------------------
Traceback (most recent call last):
  File "/home/jenkins/workspace/SparkPullRequestBuilder/python/pyspark/mllib/tests/test_streaming_algorithms.py", line 367, in test_training_and_prediction
    self._eventually(condition)
  File "/home/jenkins/workspace/SparkPullRequestBuilder/python/pyspark/mllib/tests/test_streaming_algorithms.py", line 78, in _eventually
    % (timeout, lastValue))
AssertionError: Test failed due to timeout after 30 sec, with last condition returning: Latest errors: 0.67, 0.71, 0.78, 0.7, 0.75, 0.74, 0.73, 0.69, 0.62, 0.71, 0.69, 0.75, 0.72, 0.77, 0.71, 0.74

----------------------------------------------------------------------
Ran 13 tests in 185.051s

FAILED (failures=1, skipped=1)
```

This looks happening after increasing the parallelism in Jenkins to speed up at #23111. I am able to reproduce this manually when the resource usage is heavy (with manual decrease of timeout).

## How was this patch tested?

Manually tested by

```
cd python
./run-tests --testnames 'pyspark.mllib.tests.test_streaming_algorithms StreamingLogisticRegressionWithSGDTests.test_training_and_prediction' --python-executables=python
```

Closes #23236 from HyukjinKwon/SPARK-26275.

Authored-by: Hyukjin Kwon <gurwls223@apache.org>
Signed-off-by: Hyukjin Kwon <gurwls223@apache.org>
jackylee-ch pushed a commit to jackylee-ch/spark that referenced this pull request Feb 18, 2019
… tests to speed up

## What changes were proposed in this pull request?

This PR proposes to increase parallelism in PySpark tests to speed up from 4 to 8.

It decreases the elapsed time from

https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/99163/consoleFull
Tests passed in 1770 seconds

to

https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/99186/testReport/
Tests passed in 1027 seconds

## How was this patch tested?

Jenkins tests

Closes apache#23111 from HyukjinKwon/parallelism.

Authored-by: hyukjinkwon <gurwls223@apache.org>
Signed-off-by: hyukjinkwon <gurwls223@apache.org>
jackylee-ch pushed a commit to jackylee-ch/spark that referenced this pull request Feb 18, 2019
…essionWithSGDTests.test_training_and_prediction test

## What changes were proposed in this pull request?

Looks this test is flaky

https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/99704/console
https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/99569/console
https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/99644/console
https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/99548/console
https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/99454/console
https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/99609/console

```
======================================================================
FAIL: test_training_and_prediction (pyspark.mllib.tests.test_streaming_algorithms.StreamingLogisticRegressionWithSGDTests)
Test that the model improves on toy data with no. of batches
----------------------------------------------------------------------
Traceback (most recent call last):
  File "/home/jenkins/workspace/SparkPullRequestBuilder/python/pyspark/mllib/tests/test_streaming_algorithms.py", line 367, in test_training_and_prediction
    self._eventually(condition)
  File "/home/jenkins/workspace/SparkPullRequestBuilder/python/pyspark/mllib/tests/test_streaming_algorithms.py", line 78, in _eventually
    % (timeout, lastValue))
AssertionError: Test failed due to timeout after 30 sec, with last condition returning: Latest errors: 0.67, 0.71, 0.78, 0.7, 0.75, 0.74, 0.73, 0.69, 0.62, 0.71, 0.69, 0.75, 0.72, 0.77, 0.71, 0.74

----------------------------------------------------------------------
Ran 13 tests in 185.051s

FAILED (failures=1, skipped=1)
```

This looks happening after increasing the parallelism in Jenkins to speed up at apache#23111. I am able to reproduce this manually when the resource usage is heavy (with manual decrease of timeout).

## How was this patch tested?

Manually tested by

```
cd python
./run-tests --testnames 'pyspark.mllib.tests.test_streaming_algorithms StreamingLogisticRegressionWithSGDTests.test_training_and_prediction' --python-executables=python
```

Closes apache#23236 from HyukjinKwon/SPARK-26275.

Authored-by: Hyukjin Kwon <gurwls223@apache.org>
Signed-off-by: Hyukjin Kwon <gurwls223@apache.org>
@HyukjinKwon HyukjinKwon deleted the parallelism branch March 3, 2020 01:20
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants