Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[SPARK-2645] [Core] Allow SparkEnv.stop() to be called multiple times without side effects. #6973

Closed
wants to merge 15 commits into from

Conversation

rekhajoshm
Copy link
Contributor

Fix for SparkContext stop behavior - Allow sc.stop() to be called multiple times without side effects.

Pulling functionality from apache spark
pull latest from apache spark
Pulling functionality from apache spark
@sarutak
Copy link
Member

sarutak commented Jun 24, 2015

Hi @rekhajoshm , could you explain details more in the description field?

@srowen
Copy link
Member

srowen commented Jun 24, 2015

Yeah, this might be a good change (stop() should be idempotent right?) but I'm not 100% clear how it connects to the JIRA. Also, there's no need to log an exception here.

@@ -90,17 +90,25 @@ class SparkEnv (
private var driverTmpDirToDelete: Option[String] = None

private[spark] def stop() {

if(isStopped) return
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

might be better to rewrite this as

if (!isStopped) {
  // stop everything
}

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Heh, I would have gone with just

if (isStopped) {
  return
}

to avoid the extra running indent but leave it.

@andrewor14
Copy link
Contributor

ok to test

@SparkQA
Copy link

SparkQA commented Jun 24, 2015

Test build #35709 has finished for PR 6973 at commit b566b66.

  • This patch fails Scala style tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@squito
Copy link
Contributor

squito commented Jun 24, 2015

Hi @rekhajoshm

The PR look reasonable (with the suggested changes from andrew). Could you also (a) change the title / description to be a more descriptive, eg. "Allow sc.stop() to be called multiple times" and also (b) add a test case to SparkContextSuite to prevent future regression?

thanks for working on this!

@SparkQA
Copy link

SparkQA commented Jun 25, 2015

Test build #35745 has finished for PR 6973 at commit 380c5b0.

  • This patch fails Scala style tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@rekhajoshm
Copy link
Contributor Author

Thanks for your quick inputs and review comments @sarutak @srowen @andrewor14 @squito .

@srowen – Based on my quick look yesterday, sc.stop() calls sparkEnv stop() which tries to stop the already stopped and as it is not idempotent, causes a spiral effect of uncaught_exception of 50 exit code. https://issues.apache.org/jira/browse/SPARK-2645

@squito Did as suggested.Please review.Thanks

@rekhajoshm rekhajoshm changed the title [SPARK-2645] [Core] Fix for SparkContext stop behavior [SPARK-2645] [Core] Allow sc.stop() to be called multiple times without side effects. Jun 25, 2015
@SparkQA
Copy link

SparkQA commented Jun 25, 2015

Test build #35748 has finished for PR 6973 at commit 58dba70.

  • This patch fails Spark unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

outputCommitCoordinator.stop()
rpcEnv.shutdown()
} catch {
case NonFatal(e) =>
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What's the value in adding this?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think sean's point is that by catching the error, its no longer getting passed further up the stack. If something unexpected goes wrong here, we most probably don't want to just log it, its likely to get overlooked and maybe indicates something seriously wrong.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, but further, I don't know if it helps to log it here either. I'd remove this try block.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree. Looks like the only thing this does is log the exception, but if we propagate it upwards it will be observed by the upper layers anyway. @rekhajoshm can you remove this try block?

@srowen
Copy link
Member

srowen commented Jun 25, 2015

Jenkins retest this please

@SparkQA
Copy link

SparkQA commented Jun 25, 2015

Test build #35772 has finished for PR 6973 at commit 58dba70.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

// assert(!e.getMessage.contains("Server is already stopped"))
threwNoOrOnlyExceptedException = false
case NonFatal(e) =>
threwNoOrOnlyExceptedException = true
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what exception are you expecting here? I don't see any exception when running this test locally.

If you aren't expecting any exception, then you could just have the body of the try block in the test (any exception will fail the test).

If there is some "OK" exception, we should probably check more directly for that exception, as this will allow almost any exception to get thrown.

@rekhajoshm
Copy link
Contributor Author

Thanks @srowen @squito for your review comments.
@srowen - log it for any kind of debugging, but can remove if not preferred
@squito - updated as suggested.

@squito
Copy link
Contributor

squito commented Jun 25, 2015

hmm, now I am a little confused -- that test actually passes even without your changes. Sorry this may have been my fault for suggesting a test in SparkContextSuite -- I guess SparkContext itself already handles multiple calls to stop().

I'm not convinced this is still needed, but I suppose it doesn't hurt. In any case, I'd leave the test you have, and also add a case where you just stop the env multiple times.

@vanzin
Copy link
Contributor

vanzin commented Jun 25, 2015

None of the changes here are in SparkContext; the PR title is misleading, since you're changing SparkEnv. What code path would cause SparkEnv.stop() to be called multiple times?

SparkContext.stop() is already idempotent:

if (!stopped.compareAndSet(false, true)) {
  logInfo("SparkContext already stopped.")
  return
}

@SparkQA
Copy link

SparkQA commented Jun 25, 2015

Test build #35806 has finished for PR 6973 at commit 9193a0c.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@rekhajoshm rekhajoshm changed the title [SPARK-2645] [Core] Allow sc.stop() to be called multiple times without side effects. [SPARK-2645] [Core] Allow SparkEnv.stop() to be called multiple times without side effects. Jun 26, 2015
@rekhajoshm
Copy link
Contributor Author

thanks @vanzin for your comments.Updated the title.Other than SparkContext, SparkEnv stop can be invoked from Scheduler, or if Driver commanded a shutdown, Executor stop would invoke SparkEnv stop, hence making SparkEnv idempotent as well.

On SPARK-2645, org.apache.spark.ServerStateException: Server is already stopped, from my understanding unhandled exception surfaces as uncaught_exception (exit code: 50)
Issue may be erratic, even be due to somewhere a async code state not updated correctly, shutDown hook, or (wild) scala compiler optimization within steps, or only in machine/usage specific variants :-)
From my very quick analysis, the one apparent culprit was stop() not being idempotent on SparkEnv., the fix was to prevent stopped env to be stopped again.

@squito @srowen good point; in enthusiasm to avoid unhandled exception especially when I am not expecting any or only in erratic maneuvers, but completely agree with you.updated.

Thanks again @vanzin @squito @srowen !

@SparkQA
Copy link

SparkQA commented Jun 26, 2015

Test build #35821 has finished for PR 6973 at commit a5a7d7f.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

// call stop second time
sc.stop()
} catch {
case e: Exception =>
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This try should be removed since it doesn't really add anything, except hiding the exception. Let it fly from the method to fail the test.

@rekhajoshm
Copy link
Contributor Author

thanks @srowen for your inputs.I had missed capturing exception on fail.updated(12f66b5).try/catch/fail(e) seems to be accepted practice when failing on exception.Used org.scalatest.Matchers "noException should be thrownBy" if that is preferred (1aff39c).Or please let me know if you have another preference.

import org.scalatest.Matchers._
noException should be thrownBy {
    sc = new SparkContext(new SparkConf().setAppName("test").setMaster("local"))
    val cnt = sc.parallelize(1 to 4).count()
    sc.cancelAllJobs()
    sc.stop()
    // call stop second time
    sc.stop()
  }

@SparkQA
Copy link

SparkQA commented Jun 26, 2015

Test build #35874 has finished for PR 6973 at commit 1aff39c.

  • This patch fails MiMa tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Jun 26, 2015

Test build #35873 has finished for PR 6973 at commit 12f66b5.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Jun 26, 2015

Test build #35875 has finished for PR 6973 at commit c97839a.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@srowen
Copy link
Member

srowen commented Jun 29, 2015

@rekhajoshm I think this is still in your court. The extra try blocks should be removed IMHO.

@SparkQA
Copy link

SparkQA commented Jun 29, 2015

Test build #36022 has finished for PR 6973 at commit 2ce5760.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@rekhajoshm
Copy link
Contributor Author

@srowen done.can you please review/approve it?thanks.

@andrewor14
Copy link
Contributor

@rekhajoshm LGTM other than a minor comment @srowen had.

@rekhajoshm
Copy link
Contributor Author

thanks @andrewor14 for your quick input.the last commit 2ce5760 was updated for the comment from @srowen kindly check.thanks

@rekhajoshm
Copy link
Contributor Author

had missed that inline comment. updated @andrewor14 @srowen thanks

@SparkQA
Copy link

SparkQA commented Jun 30, 2015

Test build #36077 has finished for PR 6973 at commit 446b0a4.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@@ -45,6 +45,8 @@ import org.apache.spark.storage._
import org.apache.spark.unsafe.memory.{ExecutorMemoryManager, MemoryAllocator}
import org.apache.spark.util.{RpcUtils, Utils}

import scala.util.control.NonFatal
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not needed now?

@rekhajoshm
Copy link
Contributor Author

Thanks for your inputs @srowen @andrewor14. done.also removed unused JavaConversions._ please review, thanks.

@SparkQA
Copy link

SparkQA commented Jun 30, 2015

Test build #36169 has finished for PR 6973 at commit 277043e.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@andrewor14
Copy link
Contributor

Merging into master. Thanks for the fix @rekhajoshm

@asfgit asfgit closed this in 7dda084 Jun 30, 2015
@rekhajoshm
Copy link
Contributor Author

thank you @andrewor14 @srowen for the reviews and merge.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

7 participants