Refactor constrain_type_jkind #3037

goldfirere · 2024-09-14T16:16:21Z

In #2879 (unboxed tuples), I wondered if there wasn't a simplification available for constrain_type_jkind. @ccasin rightly told me (in person) that blithely wondering that aloud is more annoying than helpful. So I took a stab at the refactor. This PR is the result.

I don't think I would have embarked on this during the workday. But it did make some fun hacking during holiday. I recognize I'm now asking @ccasin to spend workday time on this, but I think the result is enough of an improvement to motivate doing so.

The starting point was to change estimate_type_jkind to just return a jkind instead of also noting down whether the jkind was in a Tvar or Tunboxed_tuple. The rest seemed to follow from there. A high-level explanation is that the new algorithm is a little more akin to unification, where descs are checked multiple times at multiple different stages of trying. The new implementation is all in one function instead of the stratification into unify1 through unify3, but there is a similar spirit between unification and my new approach here.

You might worry that the new approach has asymptotic complexity worries in the unboxed-tuple case: because estimate_type_jkind is now deep, it would be easy to call the function at every step while descending through nested unboxed tuples. This patch is careful to avoid this pitfall: the recursive descent carries with it the jkind of the type it's descending into. Most of the time, this jkind is produced by a fresh call to estimate_type_jkind, but in the unboxed-tuple case, the jkind comes straight from the previous call, thus avoiding repeating work.

Here are some benefits of the new approach:

The new code is, I believe, simpler than the old. It is definitely shorter.
The new approach does less expansion. Previously, constrain_type_jkind expanded the type whenever the jkind wasn't any. Now it checks before doing any expansion, skipping expansion if expanding has no hope of improving the situation. (Hope happens when the jkind of the type and the desired jkind have an intersection. There is no hope when there is no intersection.)
The new approach does not allocate a node for the return value from estimate_type_jkind (though the node is quickly matched on and so probably doesn't matter much).
The old approach unified more type variables in external declarations. I'm not sure why. But this new approach accepts a declaration that was previously rejected (but seems fine to me). I did not investigate this change in behavior; didn't seem a good use of time.
There is no longer a special case for type variables at generic_level. I'm not sure why this need disappeared, but I'm not sorry to see it go.
The bug in type inference noted in typing-layouts-products/basics.ml seems to have disappeared. Yay! But the new error message is bad -- this might need some discussion with @ccasin.

Here are some drawbacks of the new approach:

Because there is less expansion, some error messages are less detailed than they were previously. For example, if we're checking whether Float_u.t < value, we never learn that Float_u.t is really float#, and so we can't say so in the error message. We could, in theory, do some expansion just to improve error messages (preferably in some mode that won't load new cmi files), but that expansion could be done in error message generation instead of during the jkind check.
Because there is less expansion, we sometimes don't know that two jkinds simply need an intersection rather than a subjkind relationship. For example, if we have type ('a : float64) t = 'a and we're checking whether 'a t : value, we'll report that float64 must be a subjkind of value for this to work. Actually, it would be sufficient for the two to have an intersection. Yet discovering this requires expansion (which is otherwise fruitless). Again, this could probably be improved by doing more expansion on the way to generating error messages, but it would be harder than the above case. I think this subtle distinction will wash over users and isn't worth pursuing, but I could be wrong.
The new algorithm triggers the "I don't know how to do type inference" error in typedecl more often. But the cases where it does actually seem sensible. And the old error message starts with a blank line, which has been (in other cases) associated with some confusion in error message generation.
The new algorithm calls get_desc more. Yet get_desc does path compression, and e.g. unify already calls get_desc a lot, so this seems OK.

Other notes:

This patch refactors the old unbox_once not to expand, because sometimes expansion has already been done.
This patch adds a new parameter to type_sort saying whether it should consider type variables fixed. This is used when type-checking externals, because we compute the sort of the arguments after type-checking is done, and thus after the check that any generalized variables in the type can actually be generalized. This is a bit unfortunate, but it seems the easiest way forward. (The principled approach would be to compute the sorts in transl_type_scheme, but that seems like it would require installation of a lot of plumbing.)
I added some support in Misc for treating result as the monad it is. This is overkill. But I'm very tired of not being able to use the effective idioms I grew up with. The code is adapted from base. I predict that if we leave this in, more people will eventually use it.

ccasin

I agree this is a clear improvement, thanks. Some things to discuss before moving forward in comments.

One observation. As is probably clear, these three bullets from your PR description are related:

There is no longer a special case for type variables at generic_level. I'm not sure why this need disappeared, but I'm not sorry to see it go.

The bug in type inference noted in typing-layouts-products/basics.ml seems to have disappeared. Yay! But the new error message is bad -- this might need some discussion with @ccasin.

This patch adds a new parameter to type_sort saying whether it should consider type variables fixed. This is used when type-checking externals, because we compute the sort of the arguments after type-checking is done, and thus after the check that any generalized variables in the type can actually be generalized. This is a bit unfortunate, but it seems the easiest way forward. (The principled approach would be to compute the sorts in transl_type_scheme, but that seems like it would require installation of a lot of plumbing.)

You may recall that a week ago at ICFP I wanted to chat about refining jkinds at generic level, but we didn't have a chance.

The history here is that the restriction on refining jkinds at generic level was added in #2269 as part of the work on layout poly externals. This seems like a sensible restriction - if something is at generic level, doesn't that mean we're done doing inference on it? But, no, actually, it doesn't - see for example Typecore.type_tuple where we create a bunch of type variables at generic level for the explicit purpose of unifying with the expected type. This is done... as an optimization, maybe? I don't know. But after discussion with Leo, I think it's safe to say making the typechecker respect the invariant that inference is "done" for things with generic level would be hard.

So, #2269 is a mistake that just hasn't caused problems until now, and we need a better solution. Yours seems fine - I was going to propose something similar if we had time to chat about it at ICFP. But it results in the surprising and maybe scary change to externals in one test, and we need to understand what is going on there, and perhaps put in something better for the sad error in the products test. As this change is basically orthogonal to this PR and needed anyway, I suggest pulling it out for separate discussion/review.

ocaml/typing/ctype.ml

ocaml/testsuite/tests/typing-layouts-products/basics.ml

ocaml/typing/ctype.ml

ocaml/testsuite/tests/typing-layouts-arrays/basics.ml

ocaml/testsuite/tests/typing-layouts/basics.ml

ocaml/typing/ctype.mli

goldfirere · 2024-09-16T19:47:18Z

OK. Comments addressed. Back to you @ccasin

This commit does not make other changes.

Also modify ./Makefile to be able to work the debugger on mac, as well as our usual setting. (with conflicts to be resolved in next commit)

goldfirere requested a review from ccasin September 14, 2024 16:16

goldfirere changed the title ~~Rae/simplify constrain type jkind~~ Refactor constrain_type_jkind Sep 14, 2024

ccasin reviewed Sep 16, 2024

View reviewed changes

mshinwell added the typing label Sep 17, 2024

ccasin approved these changes Sep 20, 2024

View reviewed changes

goldfirere and others added 3 commits September 20, 2024 16:23

Omit printing "because of imported definition"

ff6bfc3

This commit does not make other changes.

Simplify constrain_type_jkind.

05a8009

Also modify ./Makefile to be able to work the debugger on mac, as well as our usual setting. (with conflicts to be resolved in next commit)

Fix conflicts; update tests. All boring.

9a9223b

goldfirere force-pushed the rae/simplify-constrain-type-jkind branch from 53016e6 to 9a9223b Compare September 20, 2024 20:34

goldfirere merged commit 5d19796 into main Sep 20, 2024
16 checks passed

goldfirere deleted the rae/simplify-constrain-type-jkind branch September 20, 2024 21:01

ccasin mentioned this pull request Sep 22, 2024

Monads are hard #3058

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Refactor constrain_type_jkind #3037

Refactor constrain_type_jkind #3037

goldfirere commented Sep 14, 2024

ccasin left a comment

goldfirere commented Sep 16, 2024

Refactor constrain_type_jkind #3037

Refactor constrain_type_jkind #3037

Conversation

goldfirere commented Sep 14, 2024

ccasin left a comment

Choose a reason for hiding this comment

goldfirere commented Sep 16, 2024