Enforce hash key types at compile time #8893

jgaskins · 2020-03-10T02:17:03Z

I noticed in one of my apps, I changed the type of a hash key, assuming the compiler would point out all the places I missed. But then all my Hash#[]?(key) calls started to fail silently and my Hash#fetch(key, &) calls invoked their blocks when I didn't expect them to.

I looked into how much code it would take to get the compiler to enforce the key types. It was more than I expected at first because I forgot the compiler is also written in Crystal and uses these same hashes. 😂

I don't think the semantics of this PR are correct yet, especially Hash#==(other : Hash). I hardcoded false just to get that to compile, but before I spend too much time on trying to polish it all up, I'd like to get some conversation going about the pros and cons of this patch in case the current live functionality of this was intentional with full knowledge of the tradeoffs. 🙂

asterite · 2020-03-10T03:23:21Z

See #2996 and the many related issues and PRs.

This is very unlikely to change, it makes the language too rigid.

bcardiff · 2020-03-10T03:25:12Z

I was too slow to find #6154 🙃

jgaskins · 2020-03-10T18:30:14Z

Well, I didn’t want it to go too long before proposing it again. Gotta keep the tradition alive! 😂

I had a look through those PRs and I don’t think I agree that it makes it too rigid/pedantic. The extra type safety seems worth the very few extra type casts and nil checks you’d have to add for a very few hashes in your app — most hashes would not be affected. I use Crystal specifically for the type-safety guarantees, even if I’m using duck typing in most places and the type check is 10 stack frames away from where I’ve passed in the wrong object type.

I’m all about the safer code. I like that Crystal forces me to handle nil instead of letting it propagate through my app — it’s the thing that frustrates me every time I have to fix a NoMethodError in one of my Ruby apps. It seems to make more sense to apply those same guarantees to hash keys. It would’ve saved me hours of investigation yesterday if it had simply told me that my hash key was the wrong type at compile time.

It might also help compile times since each Hash type would be able to focus on key type K instead of compiling [], []?, fetch(key), fetch(key, &), key?, and delete for every type passed in to every hash. This is at best an educated guess, though. I haven’t benchmarked it.

asterite · 2020-03-10T19:08:00Z

I guess you can continue making changes to this PR until CI is green and then we can take a look at the diff and see if it's worth it.

bew · 2020-03-10T19:21:59Z

I remember during #6154 that there was a complex hierarchy in the compiler that involves modules in modules, abstract and probably some inheritance in the mix that made me give up because of a compiler bug or something like that..

But we can definitely try again, find it and eventually fix it (if not already fixed in the recent compiler versions!)

jgaskins · 2020-03-10T21:14:52Z

@bew This PR compiles the compiler and stdlib as-is. The only thing that seems to be broken based on the failing specs is hash equivalence.

I would love to see some realistic hash operations where the strong types don’t work, if anyone’s got one handy, so I can check it against this implementation and see what the application/shard code would look like in a before/after.

asterite · 2020-03-10T21:17:24Z

src/compiler/crystal/tools/playground/server.cr

@@ -470,7 +470,7 @@ module Crystal::Playground

      agent_ws = PathWebSocketHandler.new "/agent" do |ws, context|
        match_data = context.request.path.not_nil!.match(/\/(\d+)\/(\d+)$/).not_nil!
-        session_key = match_data[1]?.try(&.to_i)
+        session_key = match_data[1]?.try(&.to_i) || 0


Why is this needed?

session_key can't be nil because of this line. Because it can't store nil and it's using [] instead of []? below, this line seems like it can't return nil even without this PR or it would raise a KeyError below.

src/hash.cr

asterite · 2020-03-10T21:18:54Z

src/yaml/any.cr

+  # Returns `true` if both `self` and *other*'s raw object are equal.
+  def ==(other : YAML::Any)
+    raw == other.raw
+  end


Why was the order filpped?

I was experimenting with the error messages. Before, when I've done a catch-all and a more specific overload of a method, I've had to put the catch-all version first or the catch-all would override the more specific overload. I don't know if this was me misunderstanding the type system at the time (it was a few years ago) and misattributing the error, but either way that's apparently not how it works anymore, if it ever did.

def foo(bar : String) "strings only" end def foo(bar) "catch all" end def foo(bar : Int) 123 end foo 123 # => 123 foo "abc" # => "strings only" foo %w[abc] # => "catch all"

Some of this meant ensuring we add the YAML::Any wrapper around the hash keys. Honestly, it kinda made me wish we'd stuck with the recursive type alias. It's notable that JSON didn't have these issues because JSON is strictly string keys, so type checks on the keys didn't break it in any way.

jgaskins · 2020-03-11T06:25:21Z

Alrighty, I think this PR is ready for some conversation about how it affects code. A few of my own observations:

The majority of the changes in this PR are due to the YAML::Any wrapper trying to present as a hash
- It’s not that that was a bad idea, it was actually a fantastic idea from a code ergonomics perspective, but I feel like this PR does point out a few consequences of trying to hide that abstraction
- Surprisingly, I don’t think this would’ve been a problem if we’d stuck with the recursive YAML::Type alias from back in the day
- Equally surprisingly, JSON didn’t have any trouble with this, but I think it’s because its keys are guaranteed strings, not JSON::Any instances
Set required an explicit typecast in order to handle subtraction of an enumerable whose types were a superset of its own
- Admittedly, this feels wonky
- I don’t think it’s going to be a big deal in practice since most collection types are homogeneous
The compiler only needed 2 explicit is_a? checks
- This is way less than I expected, if I’m being completely honest
- All other checks were specifically against nil, which are extremely common in Crystal code — and even they weren’t required in too many places
The key type restriction didn’t seem to be that much of a problem in specs outside of YAML
- One thing to be aware of would be, for example, instead of Hash(Int32, X), you may want to do Hash(Int::Signed, X) if you want to be more permissive with key types
- Subclasses also work
  
  Screenshot of subclasses and union types working through the playground running in this PR’s version of Crystal
Ary’s suggestion of {1 => “a”}[1.0] surprised me that it worked in the current version
- This doesn’t work with arrays, though. You can’t do array[1.0] — it must be an Int
- I can’t bind to port 8080.0 or 8080_u64, so a simple == check seems like an exception to the rule
- This does require thinking about hash key dereference as a method call, not just in implementation, but also conceptually

I feel like this is overall closer to the rest of the Crystal type system than relying on simple equality. I’ve admittedly been looking at it through my own lens, though, and applying my own mental model of the type system. So I admit it’s probably subjective.

I’m really interested to hear what kinds of surprises this would create in real-world apps. If I have some time, I’ll try to run it against a couple of mine tomorrow and see what I come up with.

bcardiff · 2020-03-11T12:38:51Z

spec/std/set_spec.cr

@@ -179,7 +179,7 @@ describe "Set" do
  end

  it "does -" do
-    set1 = Set{1, 2, 3, 4, 5}
+    set1 = Set(Int32 | Char){1, 2, 3, 4, 5}


It bugs me how strict you need to be in this examples. I imagine that set1 - set2 no longer compiles when set1 : Set(Int32) and set2 : Enumerable(Int32 | Char). To me it falls into the same behaviour we want on equality: although we can infer it will always be false due to typing missmatch, we want to make it a valid that will return false instead of not compile.

It could happen that this set for example is generated from templates, macros and the explicit typing will be hard to inject. Hence, the type will be the infered from the elements, leading to Set(Int32).

I agree, this is a a pretty irritating tradeoff on paper, but let’s think about the overall impact of this consequence. How often is it that you have a homogeneous set that you subtract a heterogeneous set from?

Honestly, I can’t think of a single time I’ve ever used a heterogeneous set in application code, let alone in this particular scenario where I’m subtracting one from a homogenous set. To be clear, you don’t need to specify types for all sets. Just ones where the type inference doesn’t get you what you need.

I do think that if you’re going to compare values, it’s probably a good idea to know that the types you’re comparing against can be different and the type specifier helps ensure that not only do you recognize that now, but you’ll recognize that when you change the code later. My perspective is that it’s not the same as adding two sets of different types where the result is a union of all the types in both sets. If there’s a bug there it’ll likely be caught elsewhere in the app because of the new type of the resulting set. For example:

numbers = Set{1, 2, 3} another_set = Set{“1”, “2”} result = numbers - another_set # => Set(Int32){1, 2, 3} result = numbers + another_set # => Set(Int32 | String){1, 2, 3, “1”, “2”}

I don’t think the - version makes any sense at all and if I’m doing that, it’s probably a bug. When adding them, it changes the type and is more likely to catch that bug. With this PR, the subtraction would be a bug.

IMO, the type checker is there to help you. It’s why a lot of people still use static languages. Something that becomes a compile-time error that could be a runtime error is often a good thing. Being able to tell the compiler you know what you’re doing by simply adding a type annotation is also good.

All that said, I found a way to let me remove the explicit type specifier. I don’t think I agree with it, but it keeps the same behavior in this scenario.

So I guess we should disallow - between sets of different types? That is, you can only do A - B if A's element types are a superset of B.

That would need to apply to all collection types.

The problem is always equality and inclusion, specially regarding ==. Because 1 == 1.0, and because == can be generally defined between any two types, when you have Set{1, 2, 3} then subtracting Set{1.0} could be valid...

Interestingly, in Ruby:

Set[1, 2, 3] - Set[1.0, 2.0, 3.0] # => #<Set: {1, 2, 3}>

Also:

irb(main):004:0> h = {1 => 2} => {1=>2} irb(main):005:0> h[1.0] => nil

That's because Hash uses eql? and not == for comparing key values.

But then also in Ruby:

{a: 1} == {a: 1.0} # => true

but that's because it uses == for comparing elements, and 1 == 1.0 # => true.

So maybe we can disallow h[1.0] for Hash(Int32, V). However, if you have Hash(Int32 | Float64) and you have 1 as a key and you ask for 1.0 it will work and return the value even though with Hash(Int32, V) it wouldn't work... because we don't make the distinction of == and eql? like Ruby.

This is a hard problem... :-)

We'll need to decide if we want to introduce eql? to types, which might make sense in the context of union types and the way Ruby works.

Maybe we can also disallow == between any two types. For example comparing Int32 vs String wouldn't compile, which might make sense. But we can allow comparison between any two reference types, that being false by default.

Sorry for the wall of text... my point is that this PR can't be thought in isolation will all of these other things. And changing that is a huge breaking change which I'm not sure we can/should do right before 1.0.

So I guess we should disallow - between sets of different types? That is, you can only do A - B if A's element types are a superset of B.

In an ideal world, I think that makes sense.

That would need to apply to all collection types.

Yeah, that's the catch. 🙂 It's why, even though I disagree with @bcardiff I still made it work without the explicit type annotation.

Your focus on consistency is important and I'm glad you called it out. Consistency can often be more important than anyone's perspective of what's "correct" because it allows you to guess at how something works. That's a super power for a programming language. Having to consult documentation all the time gets old quick.

I definitely want to have the conversation about whether Array(Int32)#-(Array(String)) should compile, but that particular change is too far outside the scope of this PR and I apologize for hammering on it. All I wanted was for the compiler to tell me when I changed my hash key type that I'd missed some of my key checks. 😂

if you have Hash(Int32 | Float64) and you have 1 as a key and you ask for 1.0 it will work and return the value even though with Hash(Int32, V) it wouldn't work... because we don't make the distinction of == and eql? like Ruby.

I've been using Ruby for 16 years and, honestly, I never understood why Ruby used eql? for that. But this definitely sounds like it might be why.

I'm loving the discussion here. I also use Crystal in large part due to the type-safety and would prefer the A must be a superset of B. I've been bitten by things like this when developing the internals of Lucky where I'm working with data from outside the system (cookies, params, sql).

I personally would love a more strict equality with an "escape hatch" of some kind. Maybe that's with eql? or a different method. The key is that you have that escape hatch. So strict by default to catch bugs, with an escape hatch for if you want to get into the kinky stuff.

You could even do something similar to what we do in Lucky where there is a fallback method that generates a nice little error telling you why it happened and what you can do isntead:

Can't compare/set ekys/etc Objects of different types. Make sure the objects are the type you expect, or use `some_other_method` to allow setting/getting/comparing objects of different types. This way you get the safety, but you have the escape hatch along with a nice error. I think this will really make the lagnuage super powerful and easy for beginners.

This would require something like #6997 (comment):

struct Set(T) def -(other : Enumerable(U)) forall U <= T # ... end @[Deprecated("Ensure *other*'s element type is `T` or a subtype of it")] def -(other : Enumerable) dup end end

The story is more complex if we want to allow unions of Set instances as arguments (mainly due to #8973).

jgaskins · 2020-03-12T13:52:40Z

I ran one of my apps through this build and I only had two compile-time errors. One was the bug I was trying to squash with this PR itself and the other was in the nats_io/nats.cr shard, which also turned out to be a bug.

nats_io/nats.cr: Symbol key was being used with a JSON-parsed hash — it was impossible for this conditional to execute its block

Error output

$ ~/Code/crystal/bin/crystal src/scatternote.cr
Using compiled compiler at /Users/jamie/Code/crystal/.build/crystal
Showing last frame. Use --error-trace for full trace.

In lib/nats/src/nats/connection.cr:511:28

 511 | unless @server_info[:max_payload]?.nil?
                          ^
Error: no overload matches 'Hash(String, JSON::Any)#[]?' with type Symbol

Overloads are:
 - Hash(K, V)#[]?(key : K)

My own app code: I changed the hash to use an AppChannel (a concept within my app) as a key instead of a String property of that AppChannel, but accidentally left one place still using the String key. Once I fixed this compile-time error, my app was working. Before, it was silently failing.

Error output

$ ~/Code/crystal/bin/crystal src/scatternote.cr
Using compiled compiler at /Users/jamie/Code/crystal/.build/crystal
Showing last frame. Use --error-trace for full trace.

In src/scatternote.cr:215:41

 215 | if sockets = @channels[app_channel.subscription_key]?
                             ^
Error: no overload matches 'Hash(Scatternote::AppChannel, Set(HTTP::WebSocket))#[]?' with type String
Overloads are:
 - Hash(K, V)#[](key : K)

No other compilation errors, despite using hashes all over the place (it's a Pusher-like app, so it has to map channel subscriptions to sets of websockets, connection ids to websockets, AppChannels to NATS::Subscriptions, etc) and you can see from my error output that I'm also using Sets, which was one of one the point of contention in this PR.

straight-shoota · 2020-03-12T14:39:57Z

I've also tried this branch with a some shards. It helped identify a few bugs. In some cases explicitly handling nil values was needed, but that's not a huge annoyance and helps to clearly specify what happens in case of a nil value.

We should continue with these tests on more shards, but for now it really seems like this would be a great improvement to help writing better code! And the additional strictness doesn't seem too much of a burden.

asterite · 2020-03-12T15:09:58Z

Definitely! I'd like to try this out at least for a release or two to see how much of a breakage it causes, but also how many bugs are detected.

We should also restrict the type of Array#delete and others where it makes sense.

paulcsmith · 2020-03-18T15:47:51Z

Loving this change! I had a suggestion for adding helpful error messages to some methods if people have trouble with it, but just an idea. Would really love to have this 🎉

spec/compiler/crystal/tools/init_spec.cr

src/hash.cr

src/named_tuple.cr

Co-Authored-By: Johannes Müller <johannes.mueller@smj-fulda.org>

paulcsmith · 2020-04-28T14:43:33Z

Agree that this would be hugely helpful. IMO it's before 1.0 or it will never happen. Since this will catch a lot of bugs I think it should be in Crystal, but that's just my opinion :)

I also agree with the type restriction in other methods like #delete and friends! That would be ah-mazing!

bcardiff · 2020-05-19T18:34:34Z

I've been checking how this impact via test-ecosystem. Being a breaking change it is expected to break things. I don't fear that.
Some of this are already mentioned in #8893 (comment) , but I want to rephrase to share my thoughts.

So far, I've seen two main breaking things that impact too much IMO:

This PR makes it invalid to call Hash(K, V)#has_key?(K | O). The same applies for Hash#[].
Hash stops playing nice with key types that overrides #hash and #==.

Regarding (1) if the argument type, Hash(K, V)#has_key?(K2) would be such that there is no intersection with K, then I could consider it.
But since, being K2 = K | O can hold values in K I don't think this PR is in the right direction.

And then, even if we detect empty value intersection between K and K2, or allow any K2 where K < K2, there is still the (2) issue.

The (2) issue appears in wrappers like YAML::Any. They keep the #hash and #== of the raw values. For example String.
A Hash(YAML::Any, V) is no longer usable with String keys. That is uncomfortable. If we would consider going back to recursive alias instead of wrappers for YAML::Any
then this issue could be dismissed. But it's not the case.

So I've found that Hash#hash_key?, Hash#[], Hash#[]? seems to be used mostly with K? as argument type or with a wrapper type.

If nil is passed as an argument the behavior without this PR are safe.
If the wrapper implements correctly #hash and #== the behavior without this PR are safe.
With this PR I still think things end up been more uncomfortable for the user.

As a reference some of the issues I detected so far via test-ecosystem are:

In std-lib, [1,"a"] - ["a"] breaks since Hash#has_key? typing is strict in the key type. Hash(String, String)#has_key?(Int32 | String) is called. Of course this is solvable with another implementation, but the why Hash should be more terse and not Array?. This was mentioned in Enforce hash key types at compile time #8893 (comment)
In std-lib, before this PR Hash(YAML::Any, YAML::Any) could be indexed with plain String. This will no longer be the case even if String #hash and #== are in place. This is used in ameba and cli shards.
In crystal-db, since URI#scheme : String?, the line driver_class(uri.scheme) can fail. driver_class already handle missing keys via a custom exception. With this PR the logic needs to be duplicated to have nice error messages, or change the Hash to be indexable by String?.
In crystal-molinillo, since State#name is String? and that is used to access a Hash(String, ...).
In amber, Hash(String, Amber::WebSockets::Channel)#[]?' is used with JSON::Any`

Despite the temptation of more annotations and stricter interface in Hash I think this PR goes against some flexibility.
I acknowledge that some have found bugs, yet that was not my case so far.
I also agree with #8893 (comment). I still feel that hash indexing is strongly related to equality #8893 (comment) . So, using different types is allowed on both or none. And I feel that that might be way to strict.

I prefer closing or at least not consider it before 1.0

paulcsmith · 2020-05-19T20:18:00Z

Thanks for the thorough writeup and investigation. That all makes a lot of sense and helps us see why it may not be useful.

I prefer closing or at least not consider it before 1.0

I'm a bit nervous about this because if it doesn't land pre-1.0 I don't think it ever will for a very long time since it will certainly break programs in the future.

My thought would be that we have the "safer/stricter" version as the default, and have an overload for an "unsafe/less strict" version

Here's an example of what I mean:

hash = {"name" => "James"}
# By default it should not compile since it will not work
# unless you define a custom `==` or `hash` as mentioned in bcardiff's post
hash.has_key?(:name) 

# Instead let people wrap types in an `Hash::AnyKey` (or some other name)
hash.has_key?(Hash::AnyKey.new(:name))
hash[Hash::AnyKey.new(:name)]

# Or maybe shortcut for creating them
hash[Hash.any_key(:name)]

struct Hash::AnyKey(T)
  getter obj

  def initialize(@obj : T)
  end
end

So basically the methods have an overload for Hash::AnyKey so you can use any type you want. Yes, it is uglier, but it is much better for the common case where 90%+ of the time you want to use K and not something else. It has caught a few bugs in people's testing, and it would have caught some in Lucky that have since been fixed.

I think this is a great safe way to handle it, with an escape hatch (that is ugly), but works when you really need it.

Thoughts?

straight-shoota · 2020-05-19T21:40:42Z

@bcardiff I don't agree to the assessment of the breaking examples.

[1,"a"] - ["a"] should obviously work and it's only an implementation detail that this is based on Hash.
Hash(YAML::Any, YAML::Any) is not Hash(String, YAML::Any) for a reason: keys can be anything. There's however a useful API for dealing with that: YAML::Any#[]. This PR already makes sure that it continues to work with string keys by wrapping them in an Any.
In the crystal-db example, I'm sure the argument to driver_class should already be restricted to String. A driver for nil scheme is impossible to find. But the caller should handle the case of a schemaless URI because it's a part of URI parsing, not driver lookup. So I'd actually count that as identification of a flaw in the API.
In molinillo, one of the methods where name is used as a hash lookup is DependencyGraph::vertex_named!. That method's documentation even states that name is expected to be a String (it's the original Ruby annotation, not a Crystal type restriction). When the intention is that this method should only accept String, calling it with a nilable argument is wrong. Again, to me that's a flaw in the API (or its usage).
The amber example seems like it should take more care about converting JSON::Any to Crystal data types. What if that Any is a bool or int, or even an object? I'm sure that should result in a proper error message instead of just failing to look up a channel subscription under that key.

@paulcsmith I'm sure there could be alternatives for retaining the current behaviour. It could be as simple as having a separate type that inherits from Hash:

class AnyHash < Hash
  def [](any_key)
  end
  def []?(any_key)
  end
  def has_key?(any_key)
  end
end

But I doubt there are actually many use cases for that.

Instead I'd very much appreciate if the default Hash implementation had type safe keys because I'm convinced that this is more useful for catching bugs than it causes inconveniences.

oprypin · 2020-05-19T22:25:55Z

I don't support the push to get breaking changes in right before 1.0. If you find yourself thinking like that, it's already too late. It's the worst time to do it, as 1) newcomers excited about 1.0 come to the ecosystem of libraries in total disrepair and 2) you have no opportunity to course-correct in case of a mistake.

(Now, I also don't support the push for 1.0 itself at this time, but that's another topic)

oprypin · 2020-05-19T22:28:53Z

As for the change itself, I personally have never ever felt the need for this. I also think that the suggested behavior is less Crystal-like than the current one. The improvement is arguable at best and is a big enough breakage (thanks @bcardiff for the investigation) to definitely bring it below the bar of acceptance for me. Appreciate the effort, though.

jgaskins · 2020-05-20T04:18:10Z

@bcardiff Thank you for taking the time to look into this more. I can appreciate the balance you’re trying to strike and want to address some of the points you made.

This PR makes it invalid to call Hash(K, V)#has_key?(K | O). The same applies for Hash#[].

This is a really great point I thought I’d covered, but that was for Hash(K, V)#==(Hash(K | O, V)). Is there a way to make it work similarly?

Hash stops playing nice with key types that overrides #hash and #==.

I feel like this actually adds confusion. Data structures in Crystal are inherently about types but this point seems to pretend we’re duck-typing hashes.

I love that duck-typing is indeed a thing in Crystal, but object state is not duck-typable. Types are a hard requirement there — in a Hash(K, V), the key must be a K, not something that quacks like a K. Pretending otherwise is more confusing than helpful, IMO.

For example, I can’t think of a reason {1.0 => “a”}[1] == “a” evaluating to true would ever make sense and I didn’t know it did until @asterite pointed it out in this PR.

Regarding your bullet points, I sympathize with you on them because you’re trying to balance convenience, existing code, and type safety, but @straight-shoota made some great counterarguments that I not only agree with but would like to extend a couple of:

In std-lib, before this PR Hash(YAML::Any, YAML::Any) could be indexed with plain String. This will no longer be the case even if String #hash and #== are in place. This is used in ameba and cli shards.

This goes along with what I mentioned above about false duck typing. It seems convenient until you have to hunt down a bug.

That said, I haven’t had to work directly with YAML in Crystal. I’ve rarely had to do it in Ruby, which I’ve used professionally for 16 years and it uses YAML everywhere. I can’t imagine a significant number of people will be affected by this. And for those few who are, wrapping string keys in a YAML::Any (as @straight-shoota mentioned) is all that’s needed.

In crystal-db, since URI#scheme : String?, the line driver_class(uri.scheme) can fail. driver_class already handle missing keys via a custom exception. With this PR the logic needs to be duplicated to have nice error messages, or change the Hash to be indexable by String?.

I apologize if this comes across indelicately, but how this is different from everywhere else I have to handle nils in every Crystal program I write? I feel like this one of the biggest wins of Nil being its own type in Crystal, that you have to consider the case where a value can be nil. Letting a nil propagate here feels like it goes against that.

In amber, Hash(String, Amber::WebSockets::Channel)#[]? is used with JSON::Any

It seems Amber can simply call .as_s here, no? It’s something I’d do in my own apps because I didn’t realize JSON::Any overrode #hash and #== to allow this shorthand. And I’d honestly be confused if I saw a Hash(String, V) accessing keys with a JSON::Any in one of my own apps.

Again, I appreciate your thoughts and opinions here just like I appreciated discussing it with @asterite in other places in the thread, even (and maybe especially) when we were disagreeing. These are excellent things to discuss because the more perspectives we apply to this problem the better we’ll understand it.

However, I don’t think the problems you mentioned are solved by being permissive with Hash key types. Instead, I feel like that permissiveness allows solutions to be applied that are mildly convenient in the moment and problematic later on. And for cases when you just want the convenience, the permissive hash implementation described by @straight-shoota and @paulcsmith is easy enough to bolt on.

paulcsmith · 2020-05-20T14:35:13Z

I tend to agree with @straight-shoota and @jgaskins. I think the types are important. I know some people haven't run into this, but many more have. And I think the Amber example (as mentioned) should probably have a more strict type anyway and this would encourage that.

I think I'm having trouble seeing the long term downside. Short term downside is of course breakage of existing libs, but long term downside I think is not really a problem. The long-term upside is it can catch bugs that many have run into.

Plus, there are ways to fallback like @straight-shoota showed if people really need the escape hatch.

I don't support the push to get breaking changes in right before 1.0.

I think waiting will be even worse. If we don't do it now, it likely will never be moved in. It is to too fundamental a change. Adding it later will cause even bigger breakage for more programs (more people using Crystal) and you'll introduce fragmentation in the ecosystem. Pre-1.0 is likely the only time it can be added.

right before 1.0

This part I do agree with. It may mean pushing back 1.0 a bit to get the ecosystem up to date. I think making Crystal more type-safe (which is the goal of most of the Crystal APIs)

My thought here is that we absolute have to have overridable dependencies in shards before merging this in. So people can fork, update, and use shards without waiting for maintainers.

bcardiff · 2020-05-20T14:39:00Z

I like the intent of this PR. And I know there are ways to code things differently, in a less quacky fashion.

Between how numbers equality and std-lib JSON/YAML wrappers works, I think the current status is that a value can belong / have their equivalent in multiple types. Even beyond union types.

It is subtle, but it is something that has been around for a long time and I hardly believe it's the time for breaking that.

One scenario is that maybe wrappers would need to go back to recursive aliases to be more comfortable to use and allow the changes in this PR. That has its own challenges.

Regarding @paulcsmith proposal, I think something along refinements; where you can define in the lexical scope, how you intend to use a value and operations; would be a nicer approach. Otherwise, you will be stuck with wrapping/unwrapping or converting.

I think that everybody involved in the discussion agree with the facts exposed. What will break, what will not. Where we disagree is whether is a right choice to do it now. I think it breaks a lot of behavior.

The idea of 1.0 is to acknowledge more or less the current state of Crystal that has been used to built a lot of things. This PR is really a hard stopper for that idea.

Now, thinking out loud. Maybe another course of action is to enable this behind a preview flag: strict-hash. Code written for the strict-hash will also work without it. The opposite is not true clearly.

That could allow some iteration and discover more edge cases, which I believe they are. But this idea is post 1.0 IMO.

paulcsmith · 2020-05-20T14:43:03Z

@bcardiff I think you are right we agree on the facts 👍

Could you share how this would work post 1.0? I'm not sure I understand a path forward, but maybe I'm thinking with too narrow a vision! I'm all for doing it post 1.0 I just don't understand how it'd work without causing ecosystem fracturing

Edit: I love the idea of a flag, but could it potentially be applied per file like Rubys frozen strings? I ask because maybe your program works with strict-hash, but a shard you use doesn't. Does that make sense?

HankAnderson · 2020-05-20T15:02:11Z

@paulcsmith Very simple: the change is introduced in 2.0, where breaking changes will be allowed. Or that's my guess.

paulcsmith · 2020-05-20T15:31:34Z

@HankAnderson I think programming languages have a much harder time introducing backwards incompatible changes. Python made changes in 2 -> 3 but was not simple and to this day causes issues and a fractured ecosystem. So I think we need to be really careful since this change would change Hash and Array, 2 very frequently uses classes.

For example, Ruby 2.0 was almost 100% compatible with Ruby 1.0. https://www.ruby-lang.org/en/news/2013/02/24/ruby-2-0-0-p0-is-released/

We have also taken care with the 2.0.0 design to make it compatible with 1.9. It will be easier to migrate from 1.9 to 2.0 than it was from 1.8 to 1.9. (The notable incompatibilities are described later.)

The tiny changes that were breaking were fairly esoteric and quite easy to fix.

So I was hoping to learn more about how this could be a phased roll-out, or opt-in, or some other way to make it so it doesn't fracture the ecosystem. It sounds like some kind of opt-in compile time flag could be a way to slowly roll it out and require it in a future release

bcardiff · 2020-05-20T20:56:55Z

@paulcsmith exactly as you said, a compile time flag could help rolling out changes like this one IMO. The process can be:

on version A, a --preview-X is allowed. Ecosystem can choose to support it or no. What ever compiles with X, should compile and work without it.
on version A+n, we can choose to make X the default. But and introduce a --without-X for those that didn't jump into the X party
on version A+n+m, drop --without-X / --preview-X. And leave X as default.
somewhere between A and A+n we could decide that X is not worth it.

This process was done in with overflow actually. And it's not a guarantee that issues will not be discovered after X is made stable.

For whatever feature X that is introducing a stricter behaviour, I think the above process work. The language and std-lib can still evolve. For example, there was the idea to avoid mixing integer types, or at least signed and unsigned. The same path can be taken for that feature. The recent exhaustive case could have been done with this approach I think. Yet, it would have iterated more slowly maybe.

Of course having flags make things more difficult to diagnose when things are not working as expected. The alternative of breaking things in a harder way is not good IMO.

Either as a shard or tweaking the crystal_path, it is possible to check these feature your own applications and gather feedback. I explained before the issues I found in test-ecosystem, those are solvable but were not an exhaustive list and I timeout after descending deep in the dependency & forking shards that hasn't been updated in a while, but are used as dependencies here and there.

Finally, although we might go from 0.35 to 1.0, that does not mean that we will need to wait for 1.40 before jumping to 2.0. There is no timeline settle.

paulcsmith · 2020-05-20T21:59:09Z

@bcardiff thanks for the explanation! That makes a lot of sense 👍

HertzDevil · 2021-08-18T08:08:59Z

src/hash.cr

+      true
+    {% else %}
+      false
+    {% end %}
  end


Hash#== would then fail if the argument is a union of Hash instances, so one level of indirection is needed here:

class Hash(K, V) def ==(other : Hash) other.equals_impl(self) end protected def equals_impl(other : Hash(K2, V2)) forall K2, V2 # ... end end

Enforce hash key types at compile time

2d5b06c

asterite reviewed Mar 10, 2020

View reviewed changes

src/hash.cr Outdated Show resolved Hide resolved

asterite reviewed Mar 10, 2020

View reviewed changes

jgaskins added 7 commits March 10, 2020 19:43

Ensure self type doesn't get lost

af171e8

Fix hash specs

2a37670

Allow unions the other direction

7acd11a

Add unions to Hash#== specs

6184650

Fix compiler spec

599b1f9

Add Set substraction specs back in

1ab3ad4

bcardiff reviewed Mar 11, 2020

View reviewed changes

jgaskins added 2 commits March 11, 2020 09:36

Remove need for explicitly specifying set type

ede59aa

Remove one more type specifier in Set spec

51f66fb

straight-shoota mentioned this pull request Apr 7, 2020

Unexpected equality behavior between some types #9013

Closed

straight-shoota approved these changes Apr 8, 2020

View reviewed changes

straight-shoota reviewed Apr 8, 2020

View reviewed changes

spec/compiler/crystal/tools/init_spec.cr Show resolved Hide resolved

src/hash.cr Outdated Show resolved Hide resolved

src/named_tuple.cr Outdated Show resolved Hide resolved

jgaskins and others added 2 commits April 8, 2020 09:01

Remove unnecessary type check and cast

a02b625

Clarify error message on NamedTuple.from

f365d32

Co-Authored-By: Johannes Müller <johannes.mueller@smj-fulda.org>

straight-shoota requested a review from asterite May 19, 2020 12:39

bcardiff removed their request for review May 19, 2020 18:55

j8r mentioned this pull request Oct 31, 2020

Implement Array#delete_if #9856

Closed

straight-shoota added this to the post-1.0 milestone Dec 1, 2020

straight-shoota mentioned this pull request Dec 7, 2020

No type error when deleting from an array an invalid type #988

Open

docelic mentioned this pull request Jan 21, 2021

The issue with Value/Ref#== having a default implementation #10277

Open

HertzDevil reviewed Aug 18, 2021

View reviewed changes

straight-shoota added 3 commits May 5, 2022 21:23

Merge branch 'master' into enforce-hash-key-types

66d581b

Adapt Hash#subset_of?

4f7680e

Refactor NamedTuple.from

4bf7485

straight-shoota self-assigned this May 6, 2022

straight-shoota marked this pull request as draft May 6, 2022 20:16

straight-shoota mentioned this pull request May 8, 2022

Add spec for Array#- with different generic type arguments #12049

Merged

HertzDevil mentioned this pull request May 25, 2023

Defining #same? across structs with reference semantics #12297

Open

straight-shoota mentioned this pull request Apr 4, 2024

YAML::Any#dig does not accept a string key #14305

Closed

Blacksmoke16 mentioned this pull request Oct 3, 2024

Better handling of Hash with default block but of the wrong type #15057

Open

Enforce hash key types at compile time #8893

Are you sure you want to change the base?

Enforce hash key types at compile time #8893

Conversation

jgaskins commented Mar 10, 2020

asterite commented Mar 10, 2020

bcardiff commented Mar 10, 2020

jgaskins commented Mar 10, 2020

asterite commented Mar 10, 2020

bew commented Mar 10, 2020

jgaskins commented Mar 10, 2020

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

jgaskins commented Mar 11, 2020 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

HertzDevil Aug 20, 2021 • edited Loading

Choose a reason for hiding this comment

jgaskins commented Mar 12, 2020

straight-shoota commented Mar 12, 2020 • edited Loading

asterite commented Mar 12, 2020

paulcsmith commented Mar 18, 2020

paulcsmith commented Apr 28, 2020

bcardiff commented May 19, 2020

paulcsmith commented May 19, 2020 • edited Loading

straight-shoota commented May 19, 2020 • edited Loading

oprypin commented May 19, 2020

oprypin commented May 19, 2020

jgaskins commented May 20, 2020

paulcsmith commented May 20, 2020

bcardiff commented May 20, 2020

paulcsmith commented May 20, 2020 • edited Loading

HankAnderson commented May 20, 2020

paulcsmith commented May 20, 2020

bcardiff commented May 20, 2020

paulcsmith commented May 20, 2020

Choose a reason for hiding this comment

jgaskins commented Mar 11, 2020 •

edited

Loading

HertzDevil Aug 20, 2021 •

edited

Loading

straight-shoota commented Mar 12, 2020 •

edited

Loading

paulcsmith commented May 19, 2020 •

edited

Loading

straight-shoota commented May 19, 2020 •

edited

Loading

paulcsmith commented May 20, 2020 •

edited

Loading