Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

SPSC memory is allocated twice #74

Open
mgeier opened this issue Dec 17, 2021 · 2 comments
Open

SPSC memory is allocated twice #74

mgeier opened this issue Dec 17, 2021 · 2 comments

Comments

@mgeier
Copy link

mgeier commented Dec 17, 2021

This is probably not a big deal, I just wanted to mention it because it surprised me when I first realized it ...

oddio/src/spsc.rs

Lines 197 to 202 in adc60db

let mem = alloc::alloc(layout);
mem.cast::<Header>().write(Header {
read: AtomicUsize::new(0),
write: AtomicUsize::new(0),
});
Box::from_raw(ptr::slice_from_raw_parts_mut(mem, capacity) as *mut Self).into()

After allocating all memory, the Box is converted to an Arc which allocates completely new memory and copies the initialized values to it. AFAICT, the originally allocated memory is deallocated again at the end of the new() method.

Is this intentional?
Or am I missing something?

I found this when trying to convert my own SPSC ring buffer to a DST (see mgeier/rtrb#75), where this crate's code was very helpful. I avoided the repeated allocation by manually implementing a small subset of Arc for this special case instead of using Arc directly.

@Ralith
Copy link
Owner

Ralith commented Dec 17, 2021

Your analysis is correct, and this behavior is intended. I agree it's kind of ugly. For Frames, another DST implemented by oddio, the standard Arc type was desired as part of the public interface, and I think here I just reused the same approach, also since it saves some extra unsafe and doesn't cost much in the long run. At least when Arc would not otherwise be part of the public interface, I agree that rolling your own refcounting is a defensible approach as well.

I found this when trying to convert my own SPSC ring buffer to a DST (see mgeier/rtrb#75), where this crate's code was very helpful.

I'm happy to hear that! I'm a big fan of rtrb's interface, and if you merge that I'll be out of excuses not to pull it in directly in favor of my reinvented wheel here.

@mgeier
Copy link
Author

mgeier commented Dec 20, 2021

Thanks for confirming my speculations!

it saves some extra unsafe

Yeah, that's definitely an advantage. My implementation has quite a lot of unsafe.

doesn't cost much in the long run

Well, that's the question, I'm not sure about that.

In general, yes, I care more about the real-time operation than the initialization phase.
However, if really big buffers are used for some reason, the allocation overhead might become relevant?

Also, I have an idea to push this even further (but I don't know whether it's a good idea!): One could provide an (unsafe) API to construct a RingBuffer by providing an existing piece of memory to be used.
In this case, the dependency to alloc could be dropped (or rather made optional).
I have never actually needed this, though. And there haven't been any requests for something like this.
So for now this is just a theoretical deliberation.

rolling your own refcounting is a defensible approach as well.

In fact, I can get away with a much simpler approach than actual refcounting.
For one, I don't need to implement the equivalent of clone(), because producer and consumer are created at the same time anyway.
And I don't really need to count that much, I only need to represent 2 states, so an AtomicBool is sufficient.

if you merge that I'll be out of excuses not to pull it in directly in favor of my reinvented wheel here.

Yeah, I'm still unsure, I'd like to gather more input before moving forward.
I want to spend a bit more time on benchmarking, because for now the multi-threaded benchmarks show no change in performance. But maybe the benchmarks themselves could be improved.

However, I've already found a few places where the Rust compiler doesn't seem to optimize away all abstractions, so I had to "lower" a few of them to get competitive performance, see mgeier/rtrb@d0764ba.
Maybe there are still some optimizations missing ...

Have you experienced clear performance improvements when switching to a DST?

Apart from performance, I think it's more elegant to have UnsafeCell<[MaybeUninit<T>]> instead of *mut T, but it comes at the cost of more code, a lot of it being unsafe.

I'd be happy about any kind of input!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants