-
-
Notifications
You must be signed in to change notification settings - Fork 1.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Enforce hash key types at compile time #8893
base: master
Are you sure you want to change the base?
Conversation
See #2996 and the many related issues and PRs. This is very unlikely to change, it makes the language too rigid. |
I was too slow to find #6154 🙃 |
Well, I didn’t want it to go too long before proposing it again. Gotta keep the tradition alive! 😂 I had a look through those PRs and I don’t think I agree that it makes it too rigid/pedantic. The extra type safety seems worth the very few extra type casts and nil checks you’d have to add for a very few hashes in your app — most hashes would not be affected. I use Crystal specifically for the type-safety guarantees, even if I’m using duck typing in most places and the type check is 10 stack frames away from where I’ve passed in the wrong object type. I’m all about the safer code. I like that Crystal forces me to handle It might also help compile times since each |
I guess you can continue making changes to this PR until CI is green and then we can take a look at the diff and see if it's worth it. |
I remember during #6154 that there was a complex hierarchy in the compiler that involves modules in modules, abstract and probably some inheritance in the mix that made me give up because of a compiler bug or something like that.. But we can definitely try again, find it and eventually fix it (if not already fixed in the recent compiler versions!) |
@bew This PR compiles the compiler and stdlib as-is. The only thing that seems to be broken based on the failing specs is hash equivalence. I would love to see some realistic hash operations where the strong types don’t work, if anyone’s got one handy, so I can check it against this implementation and see what the application/shard code would look like in a before/after. |
@@ -470,7 +470,7 @@ module Crystal::Playground | |||
|
|||
agent_ws = PathWebSocketHandler.new "/agent" do |ws, context| | |||
match_data = context.request.path.not_nil!.match(/\/(\d+)\/(\d+)$/).not_nil! | |||
session_key = match_data[1]?.try(&.to_i) | |||
session_key = match_data[1]?.try(&.to_i) || 0 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why is this needed?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
session_key
can't be nil
because of this line. Because it can't store nil
and it's using []
instead of []?
below, this line seems like it can't return nil even without this PR or it would raise a KeyError
below.
# Returns `true` if both `self` and *other*'s raw object are equal. | ||
def ==(other : YAML::Any) | ||
raw == other.raw | ||
end |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why was the order filpped?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I was experimenting with the error messages. Before, when I've done a catch-all and a more specific overload of a method, I've had to put the catch-all version first or the catch-all would override the more specific overload. I don't know if this was me misunderstanding the type system at the time (it was a few years ago) and misattributing the error, but either way that's apparently not how it works anymore, if it ever did.
def foo(bar : String)
"strings only"
end
def foo(bar)
"catch all"
end
def foo(bar : Int)
123
end
foo 123 # => 123
foo "abc" # => "strings only"
foo %w[abc] # => "catch all"
Some of this meant ensuring we add the YAML::Any wrapper around the hash keys. Honestly, it kinda made me wish we'd stuck with the recursive type alias. It's notable that JSON didn't have these issues because JSON is strictly string keys, so type checks on the keys didn't break it in any way.
Alrighty, I think this PR is ready for some conversation about how it affects code. A few of my own observations:
I feel like this is overall closer to the rest of the Crystal type system than relying on simple equality. I’ve admittedly been looking at it through my own lens, though, and applying my own mental model of the type system. So I admit it’s probably subjective. I’m really interested to hear what kinds of surprises this would create in real-world apps. If I have some time, I’ll try to run it against a couple of mine tomorrow and see what I come up with. |
spec/std/set_spec.cr
Outdated
@@ -179,7 +179,7 @@ describe "Set" do | |||
end | |||
|
|||
it "does -" do | |||
set1 = Set{1, 2, 3, 4, 5} | |||
set1 = Set(Int32 | Char){1, 2, 3, 4, 5} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It bugs me how strict you need to be in this examples. I imagine that set1 - set2
no longer compiles when set1 : Set(Int32)
and set2 : Enumerable(Int32 | Char)
. To me it falls into the same behaviour we want on equality: although we can infer it will always be false due to typing missmatch, we want to make it a valid that will return false instead of not compile.
It could happen that this set for example is generated from templates, macros and the explicit typing will be hard to inject. Hence, the type will be the infered from the elements, leading to Set(Int32)
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree, this is a a pretty irritating tradeoff on paper, but let’s think about the overall impact of this consequence. How often is it that you have a homogeneous set that you subtract a heterogeneous set from?
Honestly, I can’t think of a single time I’ve ever used a heterogeneous set in application code, let alone in this particular scenario where I’m subtracting one from a homogenous set. To be clear, you don’t need to specify types for all sets. Just ones where the type inference doesn’t get you what you need.
I do think that if you’re going to compare values, it’s probably a good idea to know that the types you’re comparing against can be different and the type specifier helps ensure that not only do you recognize that now, but you’ll recognize that when you change the code later. My perspective is that it’s not the same as adding two sets of different types where the result is a union of all the types in both sets. If there’s a bug there it’ll likely be caught elsewhere in the app because of the new type of the resulting set. For example:
numbers = Set{1, 2, 3}
another_set = Set{“1”, “2”}
result = numbers - another_set # => Set(Int32){1, 2, 3}
result = numbers + another_set # => Set(Int32 | String){1, 2, 3, “1”, “2”}
I don’t think the -
version makes any sense at all and if I’m doing that, it’s probably a bug. When adding them, it changes the type and is more likely to catch that bug. With this PR, the subtraction would be a bug.
IMO, the type checker is there to help you. It’s why a lot of people still use static languages. Something that becomes a compile-time error that could be a runtime error is often a good thing. Being able to tell the compiler you know what you’re doing by simply adding a type annotation is also good.
All that said, I found a way to let me remove the explicit type specifier. I don’t think I agree with it, but it keeps the same behavior in this scenario.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So I guess we should disallow -
between sets of different types? That is, you can only do A - B if A's element types are a superset of B.
That would need to apply to all collection types.
The problem is always equality and inclusion, specially regarding ==
. Because 1
== 1.0
, and because ==
can be generally defined between any two types, when you have Set{1, 2, 3}
then subtracting Set{1.0}
could be valid...
Interestingly, in Ruby:
Set[1, 2, 3] - Set[1.0, 2.0, 3.0] # => #<Set: {1, 2, 3}>
Also:
irb(main):004:0> h = {1 => 2}
=> {1=>2}
irb(main):005:0> h[1.0]
=> nil
That's because Hash uses eql?
and not ==
for comparing key values.
But then also in Ruby:
{a: 1} == {a: 1.0} # => true
but that's because it uses ==
for comparing elements, and 1 == 1.0 # => true
.
So maybe we can disallow h[1.0]
for Hash(Int32, V)
. However, if you have Hash(Int32 | Float64)
and you have 1
as a key and you ask for 1.0
it will work and return the value even though with Hash(Int32, V)
it wouldn't work... because we don't make the distinction of ==
and eql?
like Ruby.
This is a hard problem... :-)
We'll need to decide if we want to introduce eql?
to types, which might make sense in the context of union types and the way Ruby works.
Maybe we can also disallow ==
between any two types. For example comparing Int32
vs String
wouldn't compile, which might make sense. But we can allow comparison between any two reference types, that being false
by default.
Sorry for the wall of text... my point is that this PR can't be thought in isolation will all of these other things. And changing that is a huge breaking change which I'm not sure we can/should do right before 1.0.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So I guess we should disallow - between sets of different types? That is, you can only do A - B if A's element types are a superset of B.
In an ideal world, I think that makes sense.
That would need to apply to all collection types.
Yeah, that's the catch. 🙂 It's why, even though I disagree with @bcardiff I still made it work without the explicit type annotation.
Your focus on consistency is important and I'm glad you called it out. Consistency can often be more important than anyone's perspective of what's "correct" because it allows you to guess at how something works. That's a super power for a programming language. Having to consult documentation all the time gets old quick.
I definitely want to have the conversation about whether Array(Int32)#-(Array(String))
should compile, but that particular change is too far outside the scope of this PR and I apologize for hammering on it. All I wanted was for the compiler to tell me when I changed my hash key type that I'd missed some of my key checks. 😂
if you have Hash(Int32 | Float64) and you have 1 as a key and you ask for 1.0 it will work and return the value even though with Hash(Int32, V) it wouldn't work... because we don't make the distinction of == and eql? like Ruby.
I've been using Ruby for 16 years and, honestly, I never understood why Ruby used eql?
for that. But this definitely sounds like it might be why.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm loving the discussion here. I also use Crystal in large part due to the type-safety and would prefer the A must be a superset of B. I've been bitten by things like this when developing the internals of Lucky where I'm working with data from outside the system (cookies, params, sql).
I personally would love a more strict equality with an "escape hatch" of some kind. Maybe that's with eql?
or a different method. The key is that you have that escape hatch. So strict by default to catch bugs, with an escape hatch for if you want to get into the kinky stuff.
You could even do something similar to what we do in Lucky where there is a fallback method that generates a nice little error telling you why it happened and what you can do isntead:
Can't compare/set ekys/etc Objects of different types.
Make sure the objects are the type you expect, or use `some_other_method` to allow setting/getting/comparing objects of different types.
This way you get the safety, but you have the escape hatch along with a nice error. I think this will really make the lagnuage super powerful and easy for beginners.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This would require something like #6997 (comment):
struct Set(T)
def -(other : Enumerable(U)) forall U <= T
# ...
end
@[Deprecated("Ensure *other*'s element type is `T` or a subtype of it")]
def -(other : Enumerable)
dup
end
end
The story is more complex if we want to allow unions of Set
instances as arguments (mainly due to #8973).
I ran one of my apps through this build and I only had two compile-time errors. One was the bug I was trying to squash with this PR itself and the other was in the
No other compilation errors, despite using hashes all over the place (it's a Pusher-like app, so it has to map channel subscriptions to sets of websockets, connection ids to websockets, |
I've also tried this branch with a some shards. It helped identify a few bugs. In some cases explicitly handling We should continue with these tests on more shards, but for now it really seems like this would be a great improvement to help writing better code! And the additional strictness doesn't seem too much of a burden. |
Definitely! I'd like to try this out at least for a release or two to see how much of a breakage it causes, but also how many bugs are detected. We should also restrict the type of |
Loving this change! I had a suggestion for adding helpful error messages to some methods if people have trouble with it, but just an idea. Would really love to have this 🎉 |
Co-Authored-By: Johannes Müller <johannes.mueller@smj-fulda.org>
Agree that this would be hugely helpful. IMO it's before 1.0 or it will never happen. Since this will catch a lot of bugs I think it should be in Crystal, but that's just my opinion :) I also agree with the type restriction in other methods like |
I've been checking how this impact via test-ecosystem. Being a breaking change it is expected to break things. I don't fear that. So far, I've seen two main breaking things that impact too much IMO:
Regarding (1) if the argument type, And then, even if we detect empty value intersection between The (2) issue appears in wrappers like So I've found that
As a reference some of the issues I detected so far via test-ecosystem are:
Despite the temptation of more annotations and stricter interface in I prefer closing or at least not consider it before 1.0 |
Thanks for the thorough writeup and investigation. That all makes a lot of sense and helps us see why it may not be useful.
I'm a bit nervous about this because if it doesn't land pre-1.0 I don't think it ever will for a very long time since it will certainly break programs in the future. My thought would be that we have the "safer/stricter" version as the default, and have an overload for an "unsafe/less strict" version Here's an example of what I mean: hash = {"name" => "James"}
# By default it should not compile since it will not work
# unless you define a custom `==` or `hash` as mentioned in bcardiff's post
hash.has_key?(:name)
# Instead let people wrap types in an `Hash::AnyKey` (or some other name)
hash.has_key?(Hash::AnyKey.new(:name))
hash[Hash::AnyKey.new(:name)]
# Or maybe shortcut for creating them
hash[Hash.any_key(:name)]
struct Hash::AnyKey(T)
getter obj
def initialize(@obj : T)
end
end So basically the methods have an overload for I think this is a great safe way to handle it, with an escape hatch (that is ugly), but works when you really need it. Thoughts? |
@bcardiff I don't agree to the assessment of the breaking examples.
@paulcsmith I'm sure there could be alternatives for retaining the current behaviour. It could be as simple as having a separate type that inherits from class AnyHash < Hash
def [](any_key)
end
def []?(any_key)
end
def has_key?(any_key)
end
end But I doubt there are actually many use cases for that. Instead I'd very much appreciate if the default |
I don't support the push to get breaking changes in right before 1.0. If you find yourself thinking like that, it's already too late. It's the worst time to do it, as 1) newcomers excited about 1.0 come to the ecosystem of libraries in total disrepair and 2) you have no opportunity to course-correct in case of a mistake. (Now, I also don't support the push for 1.0 itself at this time, but that's another topic) |
As for the change itself, I personally have never ever felt the need for this. I also think that the suggested behavior is less Crystal-like than the current one. The improvement is arguable at best and is a big enough breakage (thanks @bcardiff for the investigation) to definitely bring it below the bar of acceptance for me. Appreciate the effort, though. |
@bcardiff Thank you for taking the time to look into this more. I can appreciate the balance you’re trying to strike and want to address some of the points you made.
This is a really great point I thought I’d covered, but that was for
I feel like this actually adds confusion. Data structures in Crystal are inherently about types but this point seems to pretend we’re duck-typing hashes. I love that duck-typing is indeed a thing in Crystal, but object state is not duck-typable. Types are a hard requirement there — in a For example, I can’t think of a reason Regarding your bullet points, I sympathize with you on them because you’re trying to balance convenience, existing code, and type safety, but @straight-shoota made some great counterarguments that I not only agree with but would like to extend a couple of:
This goes along with what I mentioned above about false duck typing. It seems convenient until you have to hunt down a bug. That said, I haven’t had to work directly with YAML in Crystal. I’ve rarely had to do it in Ruby, which I’ve used professionally for 16 years and it uses YAML everywhere. I can’t imagine a significant number of people will be affected by this. And for those few who are, wrapping string keys in a
I apologize if this comes across indelicately, but how this is different from everywhere else I have to handle nils in every Crystal program I write? I feel like this one of the biggest wins of
It seems Amber can simply call Again, I appreciate your thoughts and opinions here just like I appreciated discussing it with @asterite in other places in the thread, even (and maybe especially) when we were disagreeing. These are excellent things to discuss because the more perspectives we apply to this problem the better we’ll understand it. However, I don’t think the problems you mentioned are solved by being permissive with Hash key types. Instead, I feel like that permissiveness allows solutions to be applied that are mildly convenient in the moment and problematic later on. And for cases when you just want the convenience, the permissive hash implementation described by @straight-shoota and @paulcsmith is easy enough to bolt on. |
I tend to agree with @straight-shoota and @jgaskins. I think the types are important. I know some people haven't run into this, but many more have. And I think the Amber example (as mentioned) should probably have a more strict type anyway and this would encourage that. I think I'm having trouble seeing the long term downside. Short term downside is of course breakage of existing libs, but long term downside I think is not really a problem. The long-term upside is it can catch bugs that many have run into. Plus, there are ways to fallback like @straight-shoota showed if people really need the escape hatch.
I think waiting will be even worse. If we don't do it now, it likely will never be moved in. It is to too fundamental a change. Adding it later will cause even bigger breakage for more programs (more people using Crystal) and you'll introduce fragmentation in the ecosystem. Pre-1.0 is likely the only time it can be added.
This part I do agree with. It may mean pushing back 1.0 a bit to get the ecosystem up to date. I think making Crystal more type-safe (which is the goal of most of the Crystal APIs) My thought here is that we absolute have to have overridable dependencies in shards before merging this in. So people can fork, update, and use shards without waiting for maintainers. |
I like the intent of this PR. And I know there are ways to code things differently, in a less quacky fashion. Between how numbers equality and std-lib JSON/YAML wrappers works, I think the current status is that a value can belong / have their equivalent in multiple types. Even beyond union types. It is subtle, but it is something that has been around for a long time and I hardly believe it's the time for breaking that. One scenario is that maybe wrappers would need to go back to recursive aliases to be more comfortable to use and allow the changes in this PR. That has its own challenges. Regarding @paulcsmith proposal, I think something along refinements; where you can define in the lexical scope, how you intend to use a value and operations; would be a nicer approach. Otherwise, you will be stuck with wrapping/unwrapping or converting. I think that everybody involved in the discussion agree with the facts exposed. What will break, what will not. Where we disagree is whether is a right choice to do it now. I think it breaks a lot of behavior. The idea of 1.0 is to acknowledge more or less the current state of Crystal that has been used to built a lot of things. This PR is really a hard stopper for that idea. Now, thinking out loud. Maybe another course of action is to enable this behind a preview flag: That could allow some iteration and discover more edge cases, which I believe they are. But this idea is post 1.0 IMO. |
@bcardiff I think you are right we agree on the facts 👍 Could you share how this would work post 1.0? I'm not sure I understand a path forward, but maybe I'm thinking with too narrow a vision! I'm all for doing it post 1.0 I just don't understand how it'd work without causing ecosystem fracturing Edit: I love the idea of a flag, but could it potentially be applied per file like Rubys frozen strings? I ask because maybe your program works with strict-hash, but a shard you use doesn't. Does that make sense? |
@paulcsmith Very simple: the change is introduced in 2.0, where breaking changes will be allowed. Or that's my guess. |
@HankAnderson I think programming languages have a much harder time introducing backwards incompatible changes. Python made changes in 2 -> 3 but was not simple and to this day causes issues and a fractured ecosystem. So I think we need to be really careful since this change would change Hash and Array, 2 very frequently uses classes. For example, Ruby 2.0 was almost 100% compatible with Ruby 1.0. https://www.ruby-lang.org/en/news/2013/02/24/ruby-2-0-0-p0-is-released/
The tiny changes that were breaking were fairly esoteric and quite easy to fix. So I was hoping to learn more about how this could be a phased roll-out, or opt-in, or some other way to make it so it doesn't fracture the ecosystem. It sounds like some kind of opt-in compile time flag could be a way to slowly roll it out and require it in a future release |
@paulcsmith exactly as you said, a compile time flag could help rolling out changes like this one IMO. The process can be:
This process was done in with overflow actually. And it's not a guarantee that issues will not be discovered after X is made stable. For whatever feature X that is introducing a stricter behaviour, I think the above process work. The language and std-lib can still evolve. For example, there was the idea to avoid mixing integer types, or at least signed and unsigned. The same path can be taken for that feature. The recent exhaustive case could have been done with this approach I think. Yet, it would have iterated more slowly maybe. Of course having flags make things more difficult to diagnose when things are not working as expected. The alternative of breaking things in a harder way is not good IMO. Either as a shard or tweaking the crystal_path, it is possible to check these feature your own applications and gather feedback. I explained before the issues I found in test-ecosystem, those are solvable but were not an exhaustive list and I timeout after descending deep in the dependency & forking shards that hasn't been updated in a while, but are used as dependencies here and there. Finally, although we might go from 0.35 to 1.0, that does not mean that we will need to wait for 1.40 before jumping to 2.0. There is no timeline settle. |
@bcardiff thanks for the explanation! That makes a lot of sense 👍 |
true | ||
{% else %} | ||
false | ||
{% end %} | ||
end |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hash#==
would then fail if the argument is a union of Hash
instances, so one level of indirection is needed here:
class Hash(K, V)
def ==(other : Hash)
other.equals_impl(self)
end
protected def equals_impl(other : Hash(K2, V2)) forall K2, V2
# ...
end
end
I noticed in one of my apps, I changed the type of a hash key, assuming the compiler would point out all the places I missed. But then all my
Hash#[]?(key)
calls started to fail silently and myHash#fetch(key, &)
calls invoked their blocks when I didn't expect them to.I looked into how much code it would take to get the compiler to enforce the key types. It was more than I expected at first because I forgot the compiler is also written in Crystal and uses these same hashes. 😂
I don't think the semantics of this PR are correct yet, especially
Hash#==(other : Hash)
. I hardcodedfalse
just to get that to compile, but before I spend too much time on trying to polish it all up, I'd like to get some conversation going about the pros and cons of this patch in case the current live functionality of this was intentional with full knowledge of the tradeoffs. 🙂