Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix a deadlock in TransportSet. #2258

Merged
merged 1 commit into from
Sep 12, 2016

Conversation

zhangkun83
Copy link
Contributor

Honor the lock order that transport lock > channel lock.

Resolves #2246

@zhangkun83 zhangkun83 added the TODO:backport PR needs to be backported. Removed after backport complete label Sep 12, 2016
@@ -249,9 +249,14 @@ public void run() {
delayedTransport.endBackoff();
boolean shutdownDelayedTransport = false;
Runnable runnable = null;
// TransportSet as a channel layer class should not call into transport methods while
// holding the lock, thus we call hasPendingStreams() outside of the lock. It will cause
// a _benign_ race where the TransportSet may transition to CONNECTING when there is not
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I believe the race is also pre-existing, since at any time obtainActiveTransport() can return the delayed transport and a new stream be created.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That sounds a real issue. If EndOfCurrentBackoff wins in the race with newStream(), the application will see a spurious shutdown error.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't see where the shutdown error would come from because the delayed transport becomes pass-through to the supplier for newStream(), instead of failing new streams after shutdown.

Instead of calling startNewTransport() here, we would instead enter IDLE. But since we call setTransportSupplier with a supplier that calls obtainActiveTransport, then if the race occurs there will just be a call to obtainActiveTransport that will trigger entering CONNECTING.

The only issue is that the stream will wrapped twice with a DelayedStream. But since the race should only happen at most once per stream, we decided that it wasn't a big deal.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You are right... I lost my touch with delayed transport after not touching it for a while

@ejona86
Copy link
Member

ejona86 commented Sep 12, 2016

LGTM

Honor the lock order that transport lock > channel lock.

Resolves grpc#2246
@zhangkun83 zhangkun83 merged commit 3d4ae36 into grpc:master Sep 12, 2016
@zhangkun83 zhangkun83 deleted the transportset_deadlock branch September 12, 2016 18:16
@zhangkun83 zhangkun83 removed the TODO:backport PR needs to be backported. Removed after backport complete label Sep 14, 2016
@lock lock bot locked as resolved and limited conversation to collaborators Jan 21, 2019
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants