Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Compiler: fixes and improvements to closured variables #9986

Merged
merged 9 commits into from
Dec 3, 2020

Conversation

asterite
Copy link
Member

@asterite asterite commented Nov 27, 2020

Fixes #5609
Fixes #3093

My previous attempts at this were almost fine but they had a flaw: I was reusing an existing property of MetaVar when I shouldn't have done that. I had this realization this morning (without really thinking about it, who knows how the brain works) and it's in this PR and it's working!

Please ignore this section

This PR is great! No, it's AWESOME! I know, I know, who do I think I am, but after my four previously failed attempts I just want to celebrate and consider this the best PR in the history (not just in this repo, mind you).

What's the bug

So, #5609 is this bug:

def capture(&block)
  block
end

x = 1
3.times do
  p x
  capture do
    x = "hello"
  end.call
end

The output is:

1
194613232
194613232

It works in a wrong way, yet the code compiles fine. These are the worst bug to have in a compiler. It would be better if the code didn't compile at all. Just don't produce invalid code, compiler!

How does the compiler detect and implement closured variables?

The way we detect a variable is closured is if we see it being used inside a captured block. Then, every subsequent occurrence of the variable will get the type of all types previously assigned to it (because you never know, or, well, the compiler never knows, when the block is going to be called). Additionally, things like if x or if x.is_a?(T) don't filter types because the code might be called after that check, so there's no guarantee the type will still be the same after it (remember this: it's important for something I will explain later on).

The problem was that variables declared or used before the closure was detected didn't also got bound to all types assigned to it. That's usually not a problem because if a variable is declared and used before a block captures it, why would it then get all types assigned to it later on? That code was already executed! Well... not always. If the variable is used in a loop then that can happen. And that's the bug.

How does this PR fix the bug?

The idea is simple: for every occurrence of a variable (like x above) we keep a list of all the locations (well, nodes) it was used in. Then, if we detect it was closured, we can go ahead and bind those nodes to all types assigned to it, even if they came before we detect the closure ("all types assigned to it" is dynamic, the type is recomputed if the set of types or assignments changes, as usual throughout the compiler).

However, doing that fix alone introduces an annoyance. For example this code:

def capture(&block)
  block
end

def foo(x)
  puts x.to_i if x.is_a?(String)
  capture do
    puts x
  end
end

foo(1 || "2")

Here x is of type Int32 | String. Now, x is closured in capture so every occurrence of x must have the type Int32 | String and, as I said before, things like if x.is_a?(String) don't filter types. So this code, which used to compile, wouldn't compile with just the fix I mentioned above. And that's annoying.

So, this PR also tracks whether a variable is readonly. This means whether a value was assigned to it just once. And it also takes loops (while) into consideration, because something assigned inside a while gets a value assigned to it potentially more than once.

Then, if a variable is closured but it's readonly, there's no need for all occurrences of it to get all types. It will never get assigned a value again, no problem. And that fixes #3093 . That's actually a long standing feature that I wanted to have in the language because closured readonly vars are very common, and it's nice to be able to filter their types if needed, without needing an extra local variable.

Ugh, tracking all those local variables will surely make the compiler slower

It turns out, not that much. Compiling the compiler went from 28.59s and 1173.23MB to 29.2s and 1240.18MB without any cache. Subsequent compilations went from 18.06s to 18.37s, so not a big difference. I think the reason is that usually you don't assign to a variable multiple times, so in most cases a variable will have a list of just one element attached to it.

And, as I always like to think about this, I prefer things working well but maybe a bit slower, or consuming a bit more memory, than silently working wrong. There will be time in the future to optimize all of this, or to clean it up, and when that time comes we'll have a solid spec suite to make sure things continue to work as expected.

Summary

This PR fixes one of the worst bugs a language could have, and also introduces a really neat feature, making coding simpler. What's not to like about it?

@asterite asterite changed the title Third's the charm Compiler: fixes and improvements to closured variables Nov 27, 2020
@asterite asterite added kind:bug A bug in the code. Does not apply to documentation, specs, etc. kind:feature topic:compiler:semantic labels Nov 27, 2020
@asterite
Copy link
Member Author

Ready for review!

It actually doesn't handle #3093 (comment) but overall it's still an improvement. I'll think about how to make that comment work, but it can come in a separate PR.

@asterite
Copy link
Member Author

Well, what do you know. If CI agrees, it now also fixes #3093 (comment)

That means that this will compile just fine:

x = 1
x = "hello"

capture do
  x.upcase # here x is known to always be a String, never an Int32
end

So the logic is changed a bit: instead of tracking whether a variable is readonly I now call it "mutably closured". That means it's a closure and it gets reassigned a value after it formed a closure. In the case above the variable was assigned a value multiple times, yes, but always before the closure was formed. And this is not happening inside a loop, so it's fine. If the variable is later reassigned a value after the block is captured it will fail to compile.

Copy link
Member

@jhass jhass left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Aww, I wanted to run this against Brian's ecosystem thingy but no dice due to crystal-lang/shards#413

Well, CI is green, should be good enough for testing!

❤️

@asterite
Copy link
Member Author

asterite commented Dec 3, 2020

Let's go with this. Tests pass, and at most we can revert it if things go wrong.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
kind:bug A bug in the code. Does not apply to documentation, specs, etc. kind:feature topic:compiler:semantic
Projects
None yet
2 participants