Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Text format: type more operations #408

Closed
jfbastien opened this issue Oct 13, 2015 · 24 comments
Closed

Text format: type more operations #408

jfbastien opened this issue Oct 13, 2015 · 24 comments
Milestone

Comments

@jfbastien
Copy link
Member

Why isn't return typed, e.g. i32.return?

I get that the type is implicit because of the function's signature, but where are we drawing the line on consistency?

i32.add is typed because that allows us to ignore the operand subtree's expression.

Should other operations be typed? get_local? break?

What about if type=expressions? Shouldn't all typed operations have an explicit type?

@jfbastien jfbastien added this to the MVP milestone Oct 13, 2015
@lukewagner
Copy link
Member

One subtle difference is that return and get_local's types are fully determined by the static function context, not by the types of subexpressions. I expect we'll have an easier time making non-subjective arguments in favor of different points on this type-specificity spectrum once we start defining the binary encoding and writing real decoders.

@jfbastien
Copy link
Member Author

I agree that it's known from the function's context, but it's not clear that's particularly useful. When types were added the reasoning was that it made things easier to understand for developers, and I think the same applies for return and get_local.

@lukewagner
Copy link
Member

The original motivations had more to do with the predicted simplicity of implementation and interactions with binary format which I think are the higher priority. There are still some big pending choices to be made about the binary format (e.g., type-segregated opcode spaces, module-local opcode tables) so it's hard to really evaluate this atm. What is "easiest to understand for developers" sounds like a bikeshedfest since I don't think any of these options will significantly move the dial on overall understandability; for that you want source info.

@jfbastien
Copy link
Member Author

I'm only concerned with the textual format on this. It seems easier to read IMO, since it's consistent.

@rossberg
Copy link
Member

rossberg commented Oct 14, 2015 via email

@sunfishcode
Copy link
Member

Ignoring control constructs, call, and return for a moment, should we put a type on get_local, etc.? Conceptual abstractions aside,

  • advantage: simple S-expression consumers could determine the types of such nodes without context
  • disadvantage: it makes the S-expression format more verbose

It's subjective, but it seems like a win to me. The purpose of the S-expression syntax is to be explicit and easy to process with simple tools. Thoughts?

@rossberg
Copy link
Member

I suppose my question would be the same: why single out that one? How are, say, calls any different? How does it help a simple consumer to have annotations on some but not the others?

@sunfishcode
Copy link
Member

I temporarily excluded calls, returns, and other control constructs because I'm wondering if we shouldn't address those at a higher level. I've been thinking about a stack abstraction rather than having values as operands of returns and so on. We'd of course restrict the number and types of things on the stack, and we still have structured control flow, so it'd still be easily statically type-checkable and verifiable, but it'd simplify several things.

For example, since the current thinking for multiple result values is call_multiple, which is better than some alternatives, isn't great for eg. forwarding the result values from one call to the result of another. A stack mechanism for passing values around would be a pretty simple mechanism with some nice properties. And it would separate concerns; control instructions could focus on being control instructions and not have operands for things they don't inspect themselves.

It's just an idea at this point, but it is something I'm thinking about.

@rossberg
Copy link
Member

Yeah, I think that call_multiple is the wrong approach to multiple return values, for the reasons you mention and some others. But of course we wouldn't need an explicit stack to deal with that.

Anyway, probably getting off-topic.

@sunfishcode
Copy link
Member

Ok, so putting control constructs aside for the moment, should we add types to get_local and set_local for the reasoning above?

@jfbastien
Copy link
Member Author

I'm singling out return as an example, and am asking about all operations in general.

@lukewagner
Copy link
Member

A parser of the s-expr language is already going to need to maintain a static function context (to resolve $foo local names to their associated index) so I don't see that simplifying the s-expr parser's job but, rather, just adding extra work to catch type mismatches.

@jfbastien
Copy link
Member Author

Catching type mistakes was one of the justifications to type operations in the first place. Can't use the same reason both ways ;-)

I agree that for a parser this doesn't do much. I'm approaching this from the POV of someone who would read or write this wasm assembly, the consistency seems nicer to me (having just written some examples by hand).

@rossberg
Copy link
Member

As I argued above, the fundamental difference is that (1) the types don't affect the operational meaning of these operators, so they are redundant information, and (2) they (or some of them) are not conceptually limited to a (finite) set of primitive types, but may eventually evolve to handle user-defined types or multiple values, which don't fit the limited format.

@lukewagner
Copy link
Member

@jfbastien Someone may have used that argument, but I actually don't think "catching (static) errors" should be a design goal (and it may actively work against other goals). Simplicity of spec, impl and codegen; this is what I think we should design for and I don't see i32.return helping any of those. Everything else being equal, @rossberg-chromium's conceptual argument also makes sense to me.

@titzer
Copy link

titzer commented Oct 14, 2015

I agree with @lukewagner regarding locals; verifying the AST requires a
function context that maps locals to their types, so typing get_local and
set_local would be redundant. Same reasoning for return. Calls always
reference either an explicit function or an signature, so that signature
determines their type.

On Wed, Oct 14, 2015 at 10:17 AM, Luke Wagner notifications@github.com
wrote:

@jfbastien https://github.com/jfbastien Someone may have used that
argument, but I actually don't think "catching (static) errors" should be a
design goal (and it may actively work against other goals). Simplicity of
spec, impl and codegen; this is what I think we should design for and I
don't see i32.return helping any of those. Everything else being equal,
@rossberg-chromium https://github.com/rossberg-chromium's conceptual
argument also makes sense to me.


Reply to this email directly or view it on GitHub
#408 (comment).

@jfbastien
Copy link
Member Author

I understand that the type isn't strictly required, that's not the reasoning I offered: I'm approaching this purely from the POV of reading / writing the textual format.

I agree that multi-returns or UDTs would require expanding the signature, but that's true of the semantics of the entire text format. It's pretty easy to spec properly.

The argument on "arithmetic needs the disambiguation, return doesn't" is one of aesthetics. I agree for the binary format (because not having it affects decoding speed), but the textual format doesn't require types at all (it's redundant here too). It makes the text easier to read to have the type. The same thing is true for get_local and return and other operations.

In fact, type does affect the semantics of multi-value return and UDT return.

@lukewagner
Copy link
Member

If the text format is the extent of your concerns, it seems like we should table this discussion until we start defining the text format.

@qwertie
Copy link

qwertie commented Oct 15, 2015

IMO the text format should be made to contain the same information as the binary format does, unless there is a really good reason not to, so that the text format gives users some intuition about how the binary format works. Also, if I'm writing Wasm code, I don't want to write information that the assembler will immediately throw away.

@jfbastien
Copy link
Member Author

It sounds like the consensus is: redundant type checks aren't a design goal of the text format?

@rossberg
Copy link
Member

Yes, I think so. I'd say the design goal is for the text format to contain the same amount of information as the binary format will (modulo naming things).

@rossberg
Copy link
Member

On 14 October 2015 at 22:53, JF Bastien notifications@github.com wrote:

The argument on "arithmetic needs the disambiguation, return doesn't"
is one of aesthetics. I agree for the binary format (because not having it
affects decoding speed), but the textual format doesn't require types at
all (it's redundant here too). It makes the text easier to read to have the
type. The same thing is true for get_local and return and other
operations.

Maybe the difference becomes clearer if you don't think of the type in
i32.add vs f32.add as a type annotation. It's part of the operator
name, and different types imply different operators. If you didn't have the
type in there you'd essentially introduce operator overloading into Wasm.
Contrast that with return, which is one operator, agnostic to types, and
always "doing the same".

In fact, type does affect the semantics of multi-value return and UDT
return.

Only if we screw it up. It may very well be that implementations choose to
handle them differently, but that should not be semantically observable.

@sunfishcode
Copy link
Member

I agree that there's inconsistency here, but right now it's just in the S-expression format. We can talk about whether the text format the same amount of information as the binary format when we're designing a real text and binary format :-).

@lukewagner
Copy link
Member

Just a note: I realized that another specific example of what Andreas was mentioning above is the "opaque reference type" idea mentioned in GC.md: the only operations you'd be able to perform on these opaque types are precisely the ops like get_local which don't have type annotations since they just shuffle their values around like a black box.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

6 participants