-
Notifications
You must be signed in to change notification settings - Fork 28.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[SPARK-23182][CORE] Allow enabling TCP keep alive on the RPC connections #20512
Conversation
Is it possible that TCP keepalive is disable by kernel, so that your approach cannot be worked? I was thinking if it is better to add application level heartbeat msg to detect lost workers? |
For completeness it should be possible to enable OS-level TCP keep alives. The client does enable TCP keepalive on its side and it should be possible on the server too. However, independent of that it perhaps makes sense to also have application level heartbeats because in the JVM it seems it's not possible to tune the timeouts of TCP keepalive. |
Any update? |
this is just far enough outside my expertise I don't have an opinion -- but @zsxwing might have some thoughts |
Is there any downside to enabling keepalive? should it be on by default, or always? seems OK to me. |
Test build #4475 has finished for PR 20512 at commit
|
I don't think there is, but didn't want to change the default behavior. |
Ignore the test failure for now, I think it's unrelated. |
Test build #4483 has finished for PR 20512 at commit
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I took another look at the code and as you say it's already enabled in TransportClientFactory.createClient, always. It also sets things like TCP_NODELAY
. I think at least this should be on by default, or could be? and I am not sure if it's worth another config. I'm thinking of consistency here. @rxin wrote the code for that part in the client a looong time ago.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I took another look at this again. Its worth noting that this change is much broader than what is suggested in the summary -- its changing the setting for every server, eg. driver's rpc server, external shuffle server, etc.
I'm not a real expert here, but I think turning on keep-alive is probably fine. The downside is a few more msgs sent over idle connections to make sure they're alive, and perhaps false positives in cases where one end is just under heavy load.
@srowen you mentioned NODELAY as well -- that seems like a bigger change to me, one I'd investigate more before changing (maybe others have enough experience to say more definitively whether its OK to flip, just saying that I myself would proceed more cautiously on that one).
@@ -172,6 +174,14 @@ public boolean verboseMetrics() { | |||
return conf.getBoolean(SPARK_NETWORK_VERBOSE_METRICS, false); | |||
} | |||
|
|||
/** | |||
* Whether to enable TCP keep-alive. If true, the TCP keep-alives are enabled, which removes | |||
* connections that are idle for too long. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
not really accurate -- it sends a msg when the connection is idle, but if the connection is still up then it leaves the connection up. Its only if the keep alive msg doesnt' get ack'ed that the connection is removed. I'd either say "... to detect broken connections" OR just leave it at "whether to enable TCP keep-alive", as its perhaps best explained by easily searchable external references.
I pinged some people offline who know more about networking to take a look at this. It might take a while before they can due to holiday schedule though. |
Clarification: I am not a spark expert, just got an invitation from @rxin because he think I have some knowledge about linux TCP. Generally, the patch looks good to me for the purpose of preventing inactivity from disconnecting the channel. But from the diff, looks like this commit will impact other RPCs or shuffling transport as well. It's not only for master RPCs as the title declared. Min |
I see, @peshopetrov , could you maybe narrow the scope of the change to only affect master RPCs? that seems OK, and seems like your intent. |
TCP keepalive will be disabled unless explicitly set per transport type. E.g.:
We actually set both, but if only |
@peshopetrov it looks like the TransportServer is used in two implementations of the shuffle service as well as the block transfer service, in addition to the RPC server. I think that's what people are getting at. I don't see an objection to setting keep-alive on all of these connections, so one resolution is just to change the JIRA/PR to reflect the fact that it's going to affect more than RPCs. Another resolution is to add new arguments or configs to make only the RPC server enable this new setting. That could be fine too. I personally would favor the first option; just update the description. I don't see an argument that there's substantial downside to enabling this for all of these server connections. |
If it is almost never the case that this change would bring benefits, why would we want to merge the option in?
… On Jan 8, 2019, at 5:16 PM, Sean Owen ***@***.***> wrote:
@peshopetrov it looks like the TransportServer is used in two implementations of the shuffle service as well as the block transfer service, in addition to the RPC server. I think that's what people are getting at.
I don't see an objection to setting keep-alive on all of these connections, so one resolution is just to change the JIRA/PR to reflect the fact that it's going to affect more than RPCs.
Another resolution is to add new arguments or configs to make only the RPC server enable this new setting. That could be fine too.
I personally would favor the first option; just update the description. I don't see an argument that there's substantial downside to enabling this for all of these server connections.
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub, or mute the thread.
|
My understanding is that there is a benefit for the master's RPC connections, as described in the PR description. The benefit or cost for other server connections is unclear; probably no harm other than a few extra messages on slow connection; probably no real upside. The argument for this change as-is is that it addresses the RPC master connection and just applies the setting consistently elsewhere, where it's probably neither positive nor negative. There is also an argument for making this change a little more complex to affect only the master RPCs, as described in the PR currently. I could live with either one. |
@rxin just to finish this off, which argument is more compelling to you? I tend to favor making this change, but could live with a variation that is narrower too. |
TCP keep alive is non-invasive, the only minor downside is that it would generate extra network packets. But thinking about spark's big data exchange use cases, these packets are negligible. I am good with this one config ruling all transport types, just please change the title/commit messages. |
Let’s go ahead and do it then!
…On Sat, Jan 12, 2019 at 10:11 AM Min Zhou ***@***.***> wrote:
TCP keep alive is non-invasive, the only minor downside is that it would
generate extra network packets. But thinking about spark's big data
exchange use cases, these packets are negligible. I am good with this cone
config ruling all transport types, just please change the title/commit
messages.
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
<#20512 (comment)>, or mute
the thread
<https://github.com/notifications/unsubscribe-auth/AATvPK3gL73BnB_rjCzVdWXB-zZ31yU6ks5vCiU0gaJpZM4R51bN>
.
|
Test build #4506 has finished for PR 20512 at commit
|
Test build #4509 has finished for PR 20512 at commit
|
Merged to master |
## What changes were proposed in this pull request? Make it possible for the master to enable TCP keep alive on the RPC connections with clients. ## How was this patch tested? Manually tested. Added the following: ``` spark.rpc.io.enableTcpKeepAlive true ``` to spark-defaults.conf. Observed the following on the Spark master: ``` $ netstat -town | grep 7077 tcp6 0 0 10.240.3.134:7077 10.240.1.25:42851 ESTABLISHED keepalive (6736.50/0/0) tcp6 0 0 10.240.3.134:44911 10.240.3.134:7077 ESTABLISHED keepalive (4098.68/0/0) tcp6 0 0 10.240.3.134:7077 10.240.3.134:44911 ESTABLISHED keepalive (4098.68/0/0) ``` Which proves that the keep alive setting is taking effect. It's currently possible to enable TCP keep alive on the worker / executor, but is not possible to configure on other RPC connections. It's unclear to me why this could be the case. Keep alive is more important for the master to protect it against suddenly departing workers / executors, thus I think it's very important to have it. Particularly this makes the master resilient in case of using preemptible worker VMs in GCE. GCE has the concept of shutdown scripts, which it doesn't guarantee to execute. So workers often don't get shutdown gracefully and the TCP connections on the master linger as there's nothing to close them. Thus the need of enabling keep alive. This enables keep-alive on connections besides the master's connections, but that shouldn't cause harm. Closes apache#20512 from peshopetrov/master. Authored-by: Petar Petrov <petar.petrov@leanplum.com> Signed-off-by: Sean Owen <sean.owen@databricks.com>
What changes were proposed in this pull request?
Make it possible for the master to enable TCP keep alive on the RPC connections with clients.
How was this patch tested?
Manually tested.
Added the following:
to spark-defaults.conf.
Observed the following on the Spark master:
Which proves that the keep alive setting is taking effect.
It's currently possible to enable TCP keep alive on the worker / executor, but is not possible to configure on other RPC connections. It's unclear to me why this could be the case. Keep alive is more important for the master to protect it against suddenly departing workers / executors, thus I think it's very important to have it. Particularly this makes the master resilient in case of using preemptible worker VMs in GCE. GCE has the concept of shutdown scripts, which it doesn't guarantee to execute. So workers often don't get shutdown gracefully and the TCP connections on the master linger as there's nothing to close them. Thus the need of enabling keep alive.
This enables keep-alive on connections besides the master's connections, but that shouldn't cause harm.