Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Array: optimize shift and unshift #10081

Merged
merged 3 commits into from
Dec 18, 2020
Merged

Array: optimize shift and unshift #10081

merged 3 commits into from
Dec 18, 2020

Conversation

asterite
Copy link
Member

@asterite asterite commented Dec 15, 2020

Revival of #8036 and more!

Fixes #5126
Fixes #10079
Closes #5148

How Array#shift used to work

Before this PR, an Array used to consist of a buffer, a size and a capacity. The buffer would have a capacity. As elements are pushed into it the size increases. Once the capacity is reached, the buffer is doubled in size. To do this, we call realloc on the buffer.

When calling shift, because we must remember the exact pointer where the buffer was allocated in order to do a realloc, the only thing we could do was to call realloc, then move all the elements "to the right":

If the array is:
 
  [1, 2, 3]

and we shift:

  4

we realloc:

  [1, 2, 3, _, _, _]

then we move the elements to the right:

  [_, 1, 2, 3, _, _]

then we put the new value:
 
  [4, 1, 2, 3, _, _]

The next time shift is called we must do the same thing again, except that, well, we don't need a realloc for a while, but we still need to copy the memory to the right.

How Array#shift is optimized

With this PR, Array now also tracks an offset to where the buffer starts. So for example:

If the array is:
 
  [1, 2, 3]
   ^
   buffer
   offset = 0

Initially, the buffer points to "1" and the offset to the buffer is zero.

If we shift:

  4

Then we can move the buffer (well, the point to it) to the right and remember that we have an offset of 1:
 
  [_, 2, 3]
      ^
      buffer
      offset = 1

This is very efficient because all we need to do is increment a couple of variables: no memory is moved.

Furthermore, if we later unshift a value, we can move the buffer to the left and decrement offset.

As explained in #8036, adding this offset to Array doesn't change its memory consumption, at least not in 64 bits platforms. And in 32 bits platforms it's just 4 more bytes per Array which isn't terrible.

How Array#unshift is optimized

This is something that wasn't done in #8036

The idea is that when we unshift an element and we have no more space left, we grow the array but leave the pointer to the buffer pointing to the middle of the buffer:

If the array is:
 
  [1, 2, 3]
   ^ buffer

and we unshift:

  4

Then we resize the array:
 
  [1, 2, 3, _, _, _]
   ^ buffer

We move the elements to the right and also the pointer to the buffer:

  [_, _, _, 1, 2, 3]
               ^
              buffer

In this way if another unshift comes we just need to move the buffer to the left. Just after 3 unshifts we'll need to reallocate and move the memory again, and then it will happen after 6 unshifts.

Also note how this is similar to push but in the other direction: when you push and there's no more space, some extra space remains on the left in case more pushes comes later on. This is exactly the same but in the other direction.

Why is this important?

I consider this change extremely important. With this, Array can be used as a list, as a queue, as a dequeue... whatever! Well, previously it could be used like that too, but it wasn't efficient in all cases. So with this, Array becomes a universal, efficient data structure. Just like in Ruby. No more fear of calling shift or unshift and thinking "Ugh, why do I have to pay this penalty?".

Benchmarks!

Shift

require "benchmark"

Benchmark.ips do |x|
  x.report("shift") do
    array = Array.new(10_000, &.itself)
    while array.shift?
    end
  end
end

I know this is allocating an array, but still, look at the times:

before  495.08 (  2.02ms) (± 3.56%)  39.1kB/op
after   40.19k ( 24.88µs) (± 0.98%)  39.1kB/op

Unshift

require "benchmark"

Benchmark.ips do |x|
  x.report("unshift") do
    array = [] of Int32
    10_000.times do |i|
      array.unshift(i)
    end
  end
end
before  535.09 (  1.87ms) (± 3.75%)  96.8kB/op  fastest
after   35.95k ( 27.81µs) (± 1.63%)  96.8kB/op  fastest

In both operations, this change leads to speed that's around 2 orders of magnitude faster.

Please

Can we have this in 1.0 pretty please? I don't mind undoing this if we later decide to go with a thread-safe Array. But until then everyone can enjoy a more efficient Array.

@bcardiff
Copy link
Member

I believe that in the CrystalArraySyntheticProvider, the self.buffer initialization should change to self.buffer = self.valobj.child[3].

@asterite
Copy link
Member Author

@bcardiff Where do I change that?

@bcardiff
Copy link
Member

At https://github.com/crystal-lang/crystal/blob/master/etc/lldb/crystal_formatters.py#L14

Unfortunately, the manual specs that can be trigger by https://github.com/crystal-lang/crystal/blob/master/spec/debug/test.sh do not include a spec for Array.

@bcardiff
Copy link
Member

(or you can change the ivars order 🤷‍♂️ )

@asterite
Copy link
Member Author

@bcardiff Good catch, done!

We can't change the ivars order because then the Array will occupy more space.

@bcardiff
Copy link
Member

FYI, The linux_32 failure seems to be because of a different value of __pad1 : UShort which should not matter and has probably random date. But default struct equality compares all fields.

@asterite
Copy link
Member Author

Thanks. We can fix that in a separate PR.

@RespiteSage
Copy link

This is exciting! However, how does Deque compare now to Array? Is there still a performance benefit, or does its usefulness become limited to a more focused API?

@mwlang
Copy link

mwlang commented Dec 16, 2020

This is a big deal. I would love to see this particular optimization make it into the merge. I am constantly having to refactor code that uses Array's to find different solutions than what comes most naturally. Perhaps I'm just spoiled by Ruby's implementation of Arrays and I need to be less lazy, but old habits die hard and one great reason to use Crystal is precisely because so many Ruby concepts carry over.

@asterite
Copy link
Member Author

However, how does Deque compare now to Array?

Good question. You can try doing some benchmarks on both data structures and see.

My guess is that Deque will be more memory efficient when used exclusively as a deque, compared to Array, but I don't know. For example rotate is more efficient in Deque. I wouldn't remove Deque from the standard library.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Overdosing on GC Array#shift is slow
5 participants