Implement more graceful tokio limbo send_off #72

florian-g2 · 2024-09-06T21:46:27Z

Hi!
First of all, thank you for your work! Interprocess has been very beneficial in my projects so far.

This PR addresses a potential panic in the SendHalf drop implementation and the send_off method
The new implementation takes a more graceful approach when attempting to send off the corpse.
It falls back to a synchronous bury if there is no active limbo channel and there is no tokio runtime is available.

I also removed the static runtime because I think starting a new runtime in a drop implementation is a bit much hidden cost.
The static runtime would also not help in this case because tokio disallows starting a runtime inside a runtime. (even when one is shutting down)

I noticed that most drop implementations in this crate do not bury the corpse directly but send it off either to another thread or here to a tokio runtime. I would not expect much performance (latency) regression with this synchronous bury because this is more a edge-case. But I'm not completly in the picture, do you see it the same way?

Off-topic from this PR:
I feel that having that much logic in the drop implemenations is rather bad because there is no way to propagate results back to the crate user. Maybe consider in the future to refactor the bury logic into seperate .shutdown() or .disconnect() methods without implict thread or task spawning. But that looks like no easy task though, so thats another story.

Also note:
I left quite a few comments with more details in the repro and in the PR.

Thanks for looking into it! 😊

florian-g2 · 2024-09-06T21:49:23Z

Issue: #71

kotauskas · 2024-09-07T08:46:33Z

This fix isn't quite right, since bury_sync() blocks runtime shutdown (potentially causing a lockup due to untrusted clients, which qualifies as DoS in certain configurations). Note that the whole point of this limbo construction is to hide the fact that blocking has to happen for send buffers to be preserved (send buffer preservation being a compatibility necessity).

The more correct solution here would be to send the handle to the sync limbo instead, which avoids blocking the runtime during the shutdown process.

I also removed the static runtime because I think starting a new runtime in a drop implementation is a bit much hidden cost.
The static runtime would also not help in this case because tokio disallows starting a runtime inside a runtime. (even when one is shutting down)

Yea, that bit is a little silly. My original intent there was to piggyback off of Tokio's thread pool and thus dead-code-eliminate the one sync named pipes roll in crates that only use the Tokio side of things.

I feel that having that much logic in the drop implemenations is rather bad because there is no way to propagate results back to the crate user.

Close errors can be explicitly requested via .flush(). Programs that care about handling those should flush explicitly and thus remove the limbo from the equation entirely (if you flush the stream and drop it immediately after, it does not get sent to limbo).

The send_off method now properly falls back to a static tokio runtime if required.

florian-g2 · 2024-09-07T12:22:26Z

Oh right, I did not think about untrusted clients and potential lockups. Thanks for pointing out .flush(), I also did not now about it yet.

After giving it some more thoughts, I think starting a static tokio runtime maybe isn't so fancy, but it is indeed one of the better solutions here. Sending the corpses off to the sync limbo requires some refactoring. Currently they are not compatible with each other.

I updated the PR with a send_off_to_guaranteed_limbo method inplace of the bury_sync method. With this approach we would have a guaranteed available limbo which runs on its own runtime. This has quite the similarities with the previous implementation, just with a bit more graceful handling to properly catch the shutdown edge-case.

I have two open thoughts though:

Should we always use the guaranteed limbo after it was initialized, or should we try with each send_off if we can find a tokio runtime from the users code? I'm not quite sure how tokio handles threads of blocking tasks in detail, and if we could benefit to falling back to a user runtime to potentially reuse threads there.
What is the reason behind using the custom FileHandle::flush_hndl(handle) method when we have the concrete structs available (NamedPipeServer, NamedPipeClient and File) which all have a async .flush() method? I know that they are certainly also just blocking on some FlushFileBuffers call, but wouldn't it be neater to reuse the tokio code for this?

kotauskas · 2024-09-07T12:37:34Z

Sending the corpses off to the sync limbo requires some refactoring. Currently they are not compatible with each other.

This is a matter of turning the Tokio stream into an owned handle. impl Drop for PipeStream already freaks out if the pipe stream is dropped outside of a Tokio runtime, so extracting the owned handle, ignoring the error and sending it off to the sync limbo shouldn't be a problematic change.

wouldn't it be neater to reuse the tokio code for this?

To the best of my knowledge, there is nothing to reuse – Tokio doesn't flush in named pipes (as in, there isn't a single FlushFileBuffers() call in their named pipe code), and the .flush() method on NamedPipeServer and NamedPipeClient comes from AsyncWrite by way of AsyncWriteExt, where AsyncWrite::poll_flush() successfully does absolutely nothing. Thus, FileHandle::flush_hndl() is used in the implementation of .flush() on PipeStream with a bit of additional machinery known as a Tokio flusher, which puts the FileHandle::flush_hndl() call on a spawn_blocking() task.

Using Tokio's File (which has a non-trivial .poll_flush() implementation) under the hood isn't an option either, since its flush implementation concerns itself with allowing async code to wait for completion of write operations, such that dropping File immediately reclaims resources. Some similarities to my limbo approach can be seen there.

florian-g2 · 2024-09-07T13:02:48Z

To the best of my knowledge, there is nothing to reuse

Alright. It was just a thought. I did not go all the way down the flush method. Good to know.

This is a matter of turning the Tokio stream into an owned handle ... shouldn't be a problematic change.

I did not plan implementing changes or do refactoring outside of the tokio limbo here. This is not something I would be comfortable doing. I am so far fine with the current PR then.

Implement more graceful tokio limbo send_off (fixes kotauskas#71)

660a370

The send_off method now properly falls back to a static tokio runtime if required.

florian-g2 force-pushed the fix/71 branch from b4d4a36 to 660a370 Compare September 7, 2024 12:17

florian-g2 changed the title ~~Fallback to synchronous bury when no tokio runtime is available (fixes #71)~~ Implement more graceful tokio limbo send_off Sep 7, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Implement more graceful tokio limbo send_off #72

Implement more graceful tokio limbo send_off #72

florian-g2 commented Sep 6, 2024

florian-g2 commented Sep 6, 2024

kotauskas commented Sep 7, 2024

florian-g2 commented Sep 7, 2024

kotauskas commented Sep 7, 2024

florian-g2 commented Sep 7, 2024

Implement more graceful tokio limbo send_off #72

Are you sure you want to change the base?

Implement more graceful tokio limbo send_off #72

Conversation

florian-g2 commented Sep 6, 2024

florian-g2 commented Sep 6, 2024

kotauskas commented Sep 7, 2024

florian-g2 commented Sep 7, 2024

kotauskas commented Sep 7, 2024

florian-g2 commented Sep 7, 2024