-
Notifications
You must be signed in to change notification settings - Fork 10.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix shutdown of closed fd when c-ares opens a second fd #15435
Conversation
|
|
|
386dc84
to
6ae80d7
Compare
|
1 similar comment
|
6ae80d7
to
022aab5
Compare
|
|
2 similar comments
|
|
|
2 similar comments
|
|
Thanks for catching this bug! This is what I was fumbling towards in some of my questions yesterday, but I didn't have enough of the code in my head to articulate the question clearly. But seeing this PR made me look at the code in more detail, and I think you were on the right track to begin with. Reviewed 2 of 2 files at r1, 2 of 2 files at r3. src/core/lib/iomgr/ev_epollex_linux.cc, line 443 at r3 (raw file):
I'd like @sreecha to confirm that this change is reasonable. test/cpp/naming/resolver_component_test.cc, line 277 at r3 (raw file):
I don't think it really makes sense to test for log messages. Logs tend to be useful for humans but not very appropriate for automated testing. Unfortunately, I don't have a better suggestion. The only way I see to test for this bug directly would be to find a way to open a new fd with the same number between the time that c-ares closes it and when we call shutdown on it, so that we can verify that's not happening. But it's not clear how to orchestrate that. Comments from Reviewable |
Review status: 2 of 10 files reviewed at latest revision, 2 unresolved discussions. test/cpp/naming/resolver_component_test.cc, line 277 at r3 (raw file): Previously, markdroth (Mark D. Roth) wrote…
Taking a look to see if it's doable to catch it that way, maybe using a background thread that opens and closes sockets Comments from Reviewable |
|
|
|
21630cd
to
594e049
Compare
|
1 similar comment
|
heads up that there are test failures introduced in the last commit that I need to fix here |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In the PR description, can you please add a separate note on why you removed already_closed
parameter ?
This will help us in future (just in case we realize that we do need already_closed
:) )
|
|
@sreecha I added a section to explain for removal of |
|
test failures are unrelated, @markdroth PTAL this should be ready for another look |
1e56d5a
to
fd883d5
Compare
|
|
|
I'd prefer to see the polling change split into its own PR, since it seems potentially invasive. Thanks! Reviewed 3 of 12 files at r13, 19 of 19 files at r14. src/core/lib/iomgr/ev_poll_posix.cc, line 436 at r14 (raw file):
What if src/core/lib/iomgr/ev_posix.h, line 102 at r14 (raw file):
Can we move this polling change to a separate PR? It seems invasive enough that I'd prefer to keep it separate. test/cpp/naming/resolver_component_test.cc, line 390 at r11 (raw file): Previously, apolcyn wrote…
I think the way you have it now is good enough. None of these options are that pretty, so let's just do whatever is the least amount of work. Thanks for looking into this. Comments from Reviewable |
fd883d5
to
96dbfe1
Compare
|
|
Done. I've removed this from this PR, and opened #15648 for all of that instead. Review status: 8 of 10 files reviewed at latest revision, 4 unresolved discussions, some commit checks failed. Comments from Reviewable |
Review status: 8 of 10 files reviewed at latest revision, 4 unresolved discussions, some commit checks failed. test/cpp/naming/resolver_component_test.cc, line 390 at r11 (raw file): Previously, markdroth (Mark D. Roth) wrote…
Ack. sounds good, I'll leave things as is Comments from Reviewable |
|
Review status: 8 of 10 files reviewed at latest revision, 4 unresolved discussions, some commit checks failed. src/core/lib/iomgr/ev_poll_posix.cc, line 436 at r14 (raw file): Previously, markdroth (Mark D. Roth) wrote…
I believe things are still correct in this case: if Comments from Reviewable |
Nice work! |
bazel test failures: #15610 |
After our conversation earlier, I realized we are currently doing a "use after close" on the UDP socket during the fallback to TCP case, and that the
ARES_STAY_OPEN
flag appears to be actually necessary rather than a simplification.After adding the log message that this PR add's to the iomgr, the bug can currently be reproduced with:
Now check the log and you'll find a message of "Error shutting down fd. errno: 9" within the
test that triggers fallback to TCP.
The problem is that the UDP socket is closed by c-ares once the query finishes (during the call to
ares_process_fd
on the TCP socket. But then the UDP socket read closure is left hanging, and we don't really have any way of kicking it out besides callinggrpc_fd_shutdown
on it, which happens to callshutdown
on the closed socket (shutdown returns an error ofEBADFD
).update: the added
GPR_ERROR
log message onshutdown
errors caused a bunch of@no_logging
test/core/end2end
tests to fail becauseshutdown
is failing withENOTCONN
. I've updated to filter out those logs because as I know of theshutdown
API,ENOTCONN
can occur for reasons outside of an applications control.Update: also removed already_closed parameter to grpc_fd_orphan
This PR wound up removing the
already_closed
parameter togrpc_fd_orphan
because we realized that c-ares was the only caller that setalready_closed
to true, and it turns out that it's better for c-ares to pass a non-NULLrelease_fd
parameter togrpc_fd_orphan
than it is to passalready_closed=true
, because a non-NULLrelease_fd
parameter has better semantics (not only does it avoid closing, but e.g. underepollsig
it also handles removal from the epoll set).This change is