Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Reuse inner if possible? #15

Open
wangjia184 opened this issue May 4, 2021 · 11 comments
Open

Reuse inner if possible? #15

wangjia184 opened this issue May 4, 2021 · 11 comments

Comments

@wangjia184
Copy link

wangjia184 commented May 4, 2021

I have a use case, within each second, thousands of oneshot channels are created and signaled, and then dropped.

I am thinking about if they can be reused somehow.

For each oneshot there is a new Inner allocated Arc::new(Inner::new()) on heap.
https://github.com/irrustible/async-oneshot/blob/main/src/lib.rs#L21

Can we somehow reuse the Inner in order to reduce heap allocation? Say if it is received, the Inner can be reset and used to create another pair of tx and rx?

@jjl
Copy link
Contributor

jjl commented May 4, 2021

Maybe. My local working copy I've been experimenting with some of this. The problem is that it makes it difficult to ensure safety (i.e. i have to expose quite a lot of unsafe api to enable it).

I'm open to suggestions if you have any particular ideas?

@jjl
Copy link
Contributor

jjl commented May 4, 2021

Actually, I should note that on my working copy, the arc is gone and replaced with a box because we already have atomic bitflags we can use.

@wangjia184
Copy link
Author

wangjia184 commented May 4, 2021

@jjl Box causes heap allocation too. I think it may leave the choice to caller , to reuse the inner after it is received

@jjl
Copy link
Contributor

jjl commented May 4, 2021

yes, a box does cause heap allocation, however turning the arc into one moves us one step closer to an alloc-less api (because i would ultimately like to support no-std without alloc).

the problem is that something has to do the cleanup. in my local copy, if one side drops, the other side becomes responsible for cleanup. and then there's creation. by using alloc now, i'm able to enforce a lot of invariants (like "there will not be more than two threads accessing this at a time"). if you want to start doing reuse, well creating a new pair from existing data is going to be an unsafe operation right there because i can no longer guarantee it. and then i still have to support the existing methods of use (because most people do not care about the alloc).

so it's not as straightforward as it sounds, but i think we can work s omething out.

@jjl
Copy link
Contributor

jjl commented May 4, 2021

okay, i've pushed (most of) my local changes to the feature/no-arc branch so you can have some idea of what i'm talking about.

this is work in progress. but you can see the way i was thinking of evolving the api

@jjl
Copy link
Contributor

jjl commented May 4, 2021

Hrm, I think the following style of api is possible without needing unsafe:

impl<T> Sender<T> {
  pub fn reestablish(&self) -> Result<Receiver<T>, ()>
}

And similar on the Receiver side. It can check if the other side is dropped. If so, it can proceed to reestablish. Otherwise you'd need both halves.

@wangjia184
Copy link
Author

wangjia184 commented May 4, 2021

an alloc-less api

Is that possible? I really doubt that.
But if it comes to true, it would outperform tokio's oneshot a lot.

@jjl
Copy link
Contributor

jjl commented May 4, 2021

Sure it's possible. Take a look at the WIP - the two halves use a pointer internally. That we happen to allocate it with Box::new is incidental - we can equally take a pointer of miscellaneous origin, provided it will survive as long as the two halves do. Like if you created it on the stack...

@jjl
Copy link
Contributor

jjl commented Jul 11, 2021

Ok, current plan is to base the new Sender and Receiver on the following type to hold a channel:

pub enum Holder<'a, T> {
    /// Lifetime-bound reference
    Ref(&'a T),
    /// A pointer we do not own
    BorrowedPtr(NonNull<T>),
    /// A pointer originating from a box that between us 
    #[cfg(feature="alloc")]
    OwnedBoxPtr(NonNull<T>),
}

I'm still a bit on the fence about the lifetime. I don't like to taint the Sender and Receiver types with it, especially given that today's async ecosystem doesn't really handle borrowing in nested tasks very well (it's quite a hard problem). That said, there's a trivial workaround, if such were were really considered required:

type Sender<T> = async_oneshot::Sender<'static, T>;
type Receiver<T> = async_oneshot::Receiver<'static, T>;

I don't know this extra option is particularly useful yet, but that said, I have some plans involving subtasks and borrowing on the horizon and I'm sure I'm not the only one.

Would this cover your use case adequately?

@jjl
Copy link
Contributor

jjl commented Jul 16, 2021

Still not sure exactly sure in which way you want to reuse it.

In my working copy, we now have three different means of reuse:

  • Sending a message no longer automatically closes the channel, so it can be directly reused.
  • Sender and Receiver can be put into recycle mode. After they observe the other side has closed, you can call recycle() on them and get back the other half back very cheaply.
  • Sender and Receiver can be put into reclaim mode. During what would be a drop, we can stamp the atomic flags with a sentinel value that some code you've written can use to recognise the memory is free. Or if you just know you can use unsafe.

So I think whatever it is you were hoping for, I've probably given you it.

Create/drop is now 40-50% cheaper in my working copy because I've cut out the Arc, so a create/drop without using the channel is 4 atomics (and perhaps a box, if you quite sensibly aren't managing the memory yourself).

What I would note is that you can't magically assume that getting rid of the box will make everything a lot cheaper. Benching the new version with a box and with a reference took it from ~28.5ns to ~26ns on my machine. The benchmark is probably being quite generous to the allocator, but 2ns is frankly incredible allocator performance. And I'm just using musl's allocator, which while it isn't terrible any more, it's not as fast as e.g. jemalloc, tcalloc or mimalloc.

So uh yeah, we're really in the weeds of performance here. I've got my full no alloc support, but it hasn't given me quite the performance wins I was hoping for. The biggest win by far was removing the Arc (cutting out 4 relaxed atomics!)

@jjl jjl mentioned this issue Jul 21, 2021
@jjl
Copy link
Contributor

jjl commented Jul 27, 2021

I consider this will be fixed with #18 where I've added 3 different ways to reuse hatches. Now would be an excellent time to feed back on that PR.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants