Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Ruby] Setting Array/Hash values is inconsistent/buggy #4969

Open
blowmage opened this issue Jul 26, 2018 · 23 comments
Open

[Ruby] Setting Array/Hash values is inconsistent/buggy #4969

blowmage opened this issue Jul 26, 2018 · 23 comments

Comments

@blowmage
Copy link
Contributor

Hi, the behavior in this issue describes has been consistent for all gem releases since 3.0.0.alpha, on all ruby versions, AFAIK. We have worked around it, but I think it can be better, especially as more and more gems are moving to being generated only instead of hand-written, and exposing the protobuf objects directly to users.

The ruby gem provides support for Ruby Array and Hash objects. It will automatically map an Array object to a protobuf RepeatedField object, and a Hash object to a protobuf Map object. Here are some examples using the TestMessage protobuf class used in the ruby tests:

msg_by_array = A::B::C::TestMessage.new(repeated_int64: [1, 2, 3, 4])
msg_by_array.repeated_int64.class #=> Google::Protobuf::RepeatedField
msg_by_array.repeated_int64.to_a #=> [1, 2, 3, 4]

msg_by_hash = A::B::C::TestMessage.new(map_string_string: { 'hello' => 'world' })
msg_by_hash.map_string_string.class #=> Google::Protobuf::Map
msg_by_hash.map_string_string.to_h #=> { 'hello' => 'world' }

This is great! Except you can't set a value on an instantiated protobuf object using Array or Hash, because it raises TypeError:

msg_by_array.repeated_int64 = [1, 2, 3, 4] # TypeError
# TypeError: Expected repeated field array

msg_by_hash.map_string_string = { 'hello' => 'world' } # TypeError
# TypeError: Expected Map instance

You also can't set the value in the initializer with a protobuf object, or you get an ArgumentError:

msg_by_rf = A::B::C::TestMessage.new(repeated_int64: msg_by_array.repeated_int64) # ArgumentError
# ArgumentError: Expected array as initializer value for repeated field 'repeated_int64'.

msg_by_map = A::B::C::TestMessage.new(map_string_string: msg_by_hash.map_string_string) # ArgumentError
# ArgumentError: Expected Hash object as initializer value for map field 'map_string_string'.

This causes us to perform many workarounds, which for the most part is manageable. But, in general this doesn't seem to be very usable, as it requires users to run into errors more often than they expect.

I've created the following (failing) test file that demonstrates these concepts. It can be dropped into ruby/test and should run just fine.

#!/usr/bin/ruby

require 'generated_code_pb'
require 'test/unit'

class MismatchedSettersTest < Test::Unit::TestCase

  def test_setting_Array_on_initializer
    m = A::B::C::TestMessage.new(repeated_int64: [1, 2, 3, 4])
    assert_kind_of(Google::Protobuf::RepeatedField, m.repeated_int64)
    assert_equal([1, 2, 3, 4], m.repeated_int64.to_a)
  end

  def test_setting_RepeatedField_on_initializer
    m1 = A::B::C::TestMessage.new(repeated_int64: [1, 2, 3, 4])
    assert_kind_of(Google::Protobuf::RepeatedField, m1.repeated_int64)
    assert_equal([1, 2, 3, 4], m1.repeated_int64.to_a)

    # This fails with ArgumentError: Expected array as initializer value for repeated field 'repeated_int64'.
    # because it can't take a Google::Protobuf::RepeatedField
    m2 = A::B::C::TestMessage.new(repeated_int64: m1.repeated_int64)
    assert_kind_of(Google::Protobuf::RepeatedField, m2.repeated_int64)
    assert_equal([1, 2, 3, 4], m2.repeated_int64.to_a)
  end

  def test_setting_Array_on_accessor
    m = A::B::C::TestMessage.new
    assert_kind_of(Google::Protobuf::RepeatedField, m.repeated_int64)
    assert_not_equal([1, 2, 3, 4], m.repeated_int64.to_a)
    # This fails with TypeError: Expected repeated field array
    # because it expects the value to be Google::Protobuf::RepeatedField
    m.repeated_int64 = [1, 2, 3, 4]
    assert_equal([1, 2, 3, 4], m.repeated_int64.to_a)
  end

  def test_setting_RepeatedField_on_accessor
    m1 = A::B::C::TestMessage.new(repeated_int64: [1, 2, 3, 4])
    assert_kind_of(Google::Protobuf::RepeatedField, m1.repeated_int64)
    assert_equal([1, 2, 3, 4], m1.repeated_int64.to_a)

    m2 = A::B::C::TestMessage.new
    assert_kind_of(Google::Protobuf::RepeatedField, m2.repeated_int64)
    assert_not_equal([1, 2, 3, 4], m2.repeated_int64.to_a)
    m2.repeated_int64 = m1.repeated_int64
    assert_equal([1, 2, 3, 4], m2.repeated_int64.to_a)
  end

  def test_setting_Hash_on_initializer
    m = A::B::C::TestMessage.new(map_string_string: { 'hello' => 'world' })
    assert_kind_of(Google::Protobuf::Map, m.map_string_string)
    assert_equal({ 'hello' => 'world' }, m.map_string_string.to_h)
  end

  def test_setting_Map_on_initializer
    m1 = A::B::C::TestMessage.new(map_string_string: { 'hello' => 'world' })
    assert_kind_of(Google::Protobuf::Map, m1.map_string_string)
    assert_equal({ 'hello' => 'world' }, m1.map_string_string.to_h)

    # This fails with ArgumentError: Expected Hash object as initializer value for map field 'map_string_string'.
    # because it can't take a Google::Protobuf::Map
    m2 = A::B::C::TestMessage.new(map_string_string: m1.map_string_string)
    assert_kind_of(Google::Protobuf::Map, m2.map_string_string)
    assert_equal({ 'hello' => 'world' }, m2.map_string_string.to_h)
  end

  def test_setting_Hash_on_accessor
    m = A::B::C::TestMessage.new
    assert_kind_of(Google::Protobuf::Map, m.map_string_string)
    assert_not_equal({ 'hello' => 'world' }, m.map_string_string.to_h)
    # This fails with TypeError: Expected Map instance
    # because it expects the value to be Google::Protobuf::Map
    m.map_string_string = { 'hello' => 'world' }
    assert_equal({ 'hello' => 'world' }, m.map_string_string.to_h)
  end

  def test_setting_Map_on_accessor
    m1 = A::B::C::TestMessage.new(map_string_string: { 'hello' => 'world' })
    assert_kind_of(Google::Protobuf::Map, m1.map_string_string)
    assert_equal({ 'hello' => 'world' }, m1.map_string_string.to_h)

    m2 = A::B::C::TestMessage.new
    assert_kind_of(Google::Protobuf::Map, m2.map_string_string)
    assert_not_equal({ 'hello' => 'world' }, m2.map_string_string.to_h)
    m2.map_string_string = m1.map_string_string
    assert_equal({ 'hello' => 'world' }, m2.map_string_string.to_h)
  end

end
@TeBoring
Copy link
Contributor

What is this issue looking for? Setting repeated/map fields with ruby array/map?

@TeBoring
Copy link
Contributor

Or constructor with protobuf object?

@blowmage
Copy link
Contributor Author

Being able to assign an Array or Hash to an attribute, and assign a RepeatedField or Map in an initializer. Basically, make the failing tests pass.

@TeBoring
Copy link
Contributor

In your test, I also see you check the result of to_a and to_h.
Do you also need these features?
@haberman do we want to support assign an Array or Hash to an attribute, and assign a RepeatedField or Map in an initializer? And do we guarantee the result of to_a and to_h?

@haberman
Copy link
Member

This behavior is the result of some deliberate thinking/design. Basically we try to give predictable behavior, and avoid surprises. And we try to behave like Struct generally.

The tricky issue is that, to do proper type checking, we can't just take a reference to the existing array. Consider:

arr = [1, 2, 3, 4]
msg_by_array.repeated_int64 = arr

# Whoops, now our repeated int64 field has a string in it.
arr << "Broccoli"

Instead we need a separate array-like object, RepeatedField, that can enforce that all of the members have the correct type.

It would be possible to make field assignment on a protobuf do the equivalent of:

# msg_by_array.repeated_int64 = [1, 2, 3, 4]
msg_by_array.repeated_int64.replace([1, 2, 3, 4])

But this is un-Struct like, and would cause surprises if someone wrote:

arr = [1, 2, 3, 4]
msg_by_array.repeated_int64 = arr

# Whoops, won't affect msg_by_array.repeated_int64
# even though with Struct it would.
arr << 5 

To maintain consistency here, we decided not to allow direct assignment. Both of the following work as alternatives:

# Append
msg_by_array.repeated_int64 += [1, 2, 3, 4]

# Replace/assign.
msg_by_array.repeated_int64.replace([1, 2, 3, 4])

You could argue that we've already broken this model by allowing the following:

msg_by_array = A::B::C::TestMessage.new(repeated_int64: [1, 2, 3, 4])

But I see that as more akin to how Array#new accepts an array parameter which it copies into the new array.

All that said, it's possible our efforts at consistency here are annoying people more than they are saving people from frustrating surprises. We just want to be cautious here, because "copy on assignment" is different than how almost everything else in Ruby works. You would never expect this to be doing any copying:

Struct.new("Foo", :a)
foo = Struct::Foo.new

# This will never copy a.
foo.a = a

@blowmage
Copy link
Contributor Author

I guess what I'm saying here is that if we can accept an Array in an initializer method, with all the appropriate type checking, then should be able to do the same on an attribute setter method. Similarly, if we can accept a ReleatedField on an attribute setter method, then should be able to accept it in an initializer method. Why is the behavior of initializer methods and attribute setter methods different?

We have seen developers both outside and inside Google bump into this.

@haberman
Copy link
Member

Why is the behavior of initializer methods and attribute setter methods different?

Because Array#new(arr) copies arr -- that's the prior art we're relying on for allowing it in the constructor. But foo.a = a rarely/never copies a, which is why we don't (currently) do that either.

@haberman
Copy link
Member

On the other hand you're not the first person to ask for this. Here is another:

#4385

Let me poll a few other people and see what they think.

@blowmage
Copy link
Contributor Author

If I were to prioritize, I would place supporting RepeatedField and Map objects in the initializer method above supporting Array and Hash objects on attribute setter methods. I find this behavior the most surprising and don't understand why this is not allowed.

Even though it is the lower priority in my mind, I would still like to see supporting Array and Hash objects on the attribute setter methods. Even with the understanding that the Array/Hash objects are not stored by the method, but mapped to the protobuf types in the class definition. I would offer that this edge case can be adequately explained in the attribute setter documentation so users would not be as confused by the behavior.

To be clear, mapping Array/Hash objects to RepeatedField/Map objects does challenge some ruby semantics. For example, Ruby parses and treats calls methods that end with = differently than other methods:

class Foo
  def bar
    @bar
  end

  def bar=(new_bar)
    @bar = new_bar.reverse
  end
end

foo = Foo.new
# Chain calls
left_hand_bar = foo.bar = "bar"
# The foo object has the "mapped" value
foo.bar #=> "rab"
# The left-hand object has the original right-hand value
left_hand_bar #=> "bar"

That said, I still think the convenience of mapping the values to the correct types is worth the properly documented tweak to Ruby idioms.

@TeBoring
Copy link
Contributor

TeBoring commented Aug 6, 2018

@haberman any update?

@haberman
Copy link
Member

haberman commented Aug 9, 2018

I agree that there is no reason not to accept RepeatedField/Map in the constructor. Let me write a change to do that part.

@ebenoist
Copy link
Contributor

Let me know if I can help, but I think #4385 would address this. I can help rebase/test if needed.

@blowmage
Copy link
Contributor Author

Here is another example of a user struggling with the unfriendliness of the current API: googleapis/google-cloud-ruby#2839

@blowmage
Copy link
Contributor Author

I agree that there is no reason not to accept RepeatedField/Map in the constructor. Let me write a change to do that part.

@haberman Are you still planning on authoring a change to allow this?

@bobbytables
Copy link

Can we help with this? This is a pretty unfriendly API.

@gbirchmeier
Copy link

gbirchmeier commented Sep 20, 2019

I came here from googleapis/google-cloud-ruby#1477, and still did not find an answer. Why is this so difficult?

How do I create a RepeatedField of a custom type? Just show me. Give me an example.

@gbirchmeier
Copy link

gbirchmeier commented Sep 20, 2019

Ugh. Turns out the answer is just to create your message, and then push new items to that message's RepeatedField.

Here's a janky example:

Your proto is:

message InstrumentAttributes {
    ...
    repeated TradingHours trading_schedule = 6;
    ...
}

You want to set trading_schedule in your message. Reading the docs might make you think you need to call RepeatedField.new, and if you were repeating a simple string or int32, you could do that. But you can't, and the docs' example code doesn't demonstrate anything else.

The docs do say that "The RepeatedField type supports all of the same methods as a regular Ruby Array", but I missed it (until revising this comment). I'm sure I can't be alone in that.

What you really want to do is this:

atts = MyNamespace::InstrumentAttributes.new
# atts.trading_hours is currently an empty RepeatedField

trading_hours1 = create_trading_hours() # returns a MyNamespace::TradingHours
trading_hours2 = create_trading_hours()
atts.trading_schedule.push(trading_hours1)
atts.trading_schedule.push(trading_hours2)

Now atts.trading_hours will be an array of your 2 elements.

I still don't actually know how to create a RepeatedField of MyNamespace::TradingHours, but I guess I don't need to.

@Nakilon
Copy link

Nakilon commented Sep 20, 2019

I came here from googleapis/google-cloud-ruby#1477, and still did not find an answer. Why is this so difficult?

How do I create a RepeatedField of a custom type? Just show me. Give me an example.

You mentioned me there. It was a year ago and I have probably already forgot some details there also the gem has probably been updated a lot. That old code with old gem dependencies "just works" since then for my monitoring needs that did not expand.

As far as I remember, the point of my code was that in the time_series.points you have to define only the time_series while points is being defined automatically and you can just use it immediately to push values into it.

Maybe that snippet is not compatible with current gems -- it has their versions mentioned in comments if you scroll the snippet horizontally. I remember I've spent a day or two greping through libs and specs to come up with it so it's ok to find it to be difficult.

thiagoss pushed a commit to thiagoss/protoc-gen-rbi that referenced this issue Oct 13, 2020
According to protocolbuffers/protobuf#4969 an
array can only be used at the constructor, not when setting and reading
from the field on the object.
thiagoss pushed a commit to thiagoss/protoc-gen-rbi that referenced this issue Oct 13, 2020
According to protocolbuffers/protobuf#4969 an
array cannot be used at the setter, it needs to be a repeatedfield.

Using Array instead of Enumerable at constructor and getter allows using
.replace() and += to add elements to a RepeatedField without having to
explicitly using a RepeatedField.new in your code.
thiagoss pushed a commit to thiagoss/protoc-gen-rbi that referenced this issue Oct 13, 2020
According to protocolbuffers/protobuf#4969 an
array cannot be used at the setter, it needs to be a repeatedfield.

Using Array instead of Enumerable at constructor and getter allows using
.replace() and += to add elements to a RepeatedField without having to
explicitly using a RepeatedField.new in your code.
@elharo elharo removed the question label Sep 22, 2021
@haberman
Copy link
Member

haberman commented Sep 1, 2022

Sorry for the long inactivity on this issue. I agree that we should fix it, in both cases (constructor and assignment).

There is one unresolved question in my mind. Suppose you write:

msg1 = A::B::C::TestMessage.new(repeated_int64: [1, 2, 3, 4])
msg2 = A::B::C::TestMessage.new(repeated_int64: msg1.repeated_int64) # ArgumentError
msg1.repeated_int64 << 5
print(msg2.repeated_int64.size)  # 4 or 5?

Should the last line print 4 or 5?

In other words, if you specify an existing RepeatedField object in the constructor, should we deep copy it, or point to it by reference?

@blowmage
Copy link
Contributor Author

In other words, if you specify an existing RepeatedField object in the constructor, should we deep copy it, or point to it by reference?

I would expect a deep copy.

Copy link

github-actions bot commented Jun 2, 2024

We triage inactive PRs and issues in order to make it easier to find active work. If this issue should remain active or becomes active again, please add a comment.

This issue is labeled inactive because the last activity was over 90 days ago.

@github-actions github-actions bot added the inactive Denotes the issue/PR has not seen activity in the last 90 days. label Jun 2, 2024
@benoist-folio
Copy link

benoist-folio commented Jun 4, 2024

This continues to be an issue, but I do think this implementation: #4385 fills the need here?

@github-actions github-actions bot removed the inactive Denotes the issue/PR has not seen activity in the last 90 days. label Jun 5, 2024
@haberman
Copy link
Member

I think the main things blocking this now are just:

  1. Deciding on a consistent set of semantics.
  2. Implementing those semantics.
  3. Waiting for a breaking change release (if the change ends up being breaking).

Point (1) mainly concerns the question of what is a deep copy vs a reference, ie. what I alluded to above in #4969 (comment). I think we want to avoid doing deep copies for every kind of assignment, because people frequently want to do things like build a message tree from the leaves up. If every msg.submsg = submsg assignment was a deep copy, leaves-up builds would be really unnecessarily expensive.

Probably the simplest resolution would be to allow assignments that are currently rejected, like msg.repeated_field = ruby_array, and to only deep copy when performing this kind of implicit type conversion. That could also avoid making this a breaking change, since we wouldn't be changing the semantics of any existing (working) code.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

11 participants