-
Notifications
You must be signed in to change notification settings - Fork 696
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
'retval' proposal re: calls and multiple return values #280
Comments
C++ does allow you to return multiple values with structs. You can even I think the main issue you're discussing is: with just one return value calls only generate temporaries. With multiple, these temporaries become awkward.
Versus:
What if I want to use the two temporaries? Tough luck! I'm not sure I like the proposed solution, but I can't think of any alternative for now... |
For an outside datapoint, Virgil supports arbitrary tuples [ Restricting multiple-value nodes in some sense would be good here, I think. That said, the implicit dataflow from the last call to the next retval On Thu, Jul 23, 2015 at 8:26 PM, JF Bastien notifications@github.com
|
One mitigation for some of your concerns about the dataflow would be for But if it just makes things harder on the runtime author, that's no good. Maybe there's a way to retain most of the size wins without doing that. |
@jfbastien what? why me? |
I think what I want to avoid is creating first-class tuple types, and tuple types that can exist as locals. I feel like the I'm leaning towards liking |
Well, retval has a few advantages that led me to propose it (smaller ast, more-compressible values, ability to discard return values) but |
On Thu, Jul 23, 2015 at 10:44 PM, JF Bastien notifications@github.com
Maybe we could have a multi-call bytecode that has the tie built into it?
|
@jfbastien Why do you want to avoid creating first-class tuple types? |
First-class tuples are the same as structs. The backend needs to start being smarter, and we need to define a bunch more than what we have now. |
I'm concerned that I do like the idea of combining the destructuring of a returned tuple with a
One issue is that One idea that I didn't see brought up that I've been assuming for multi-return:
then the AST could be:
which is pretty nice, I think. To be clear what's happening: as a binary-return op, Addressing some specific points of the OP:
It is definitely the plan for LLVM to attempt to maximally reuse local slots. This is good for a number of reasons:
Thus, I think we are already assuming that locals were going to be aggressively reused and this was actually seen as a strength of our psuedo-register approach compared to mandating SSA form.
I think most of this analysis would be done in LLVM IR, before locals/wasm enter the picture. In general, any wasm compiler will invariable need the ability to decompose an expression using temporaries. A natural impl would be to maintain a logical stack of temporaries to maximize local reuse.
I still think it involves some small amount of analysis which means that a dumb compiler won't benefit from it (assuming the dumb compiler is doing single pass generate-code-as-you-go). OTOH, a beefy compiler won't have any problems with locals.
It's important to qualify "easier". I think there is limited value to trying to make wasm more ergonomic to write by hand (that's definitely not a tier-1 priority compared to simplicity, speed of decoding and size). OTOH, "easier" could mean "requires less bytes", but I think there are multiple ways to achieve that goal in terms of tuples (think
Maybe, but needs measurement and again I expect we could introduce operations on tuples to a similar effect if we saw there were significant savings from multi-return. |
Good points on all of this, Luke. Re: deduplication the advantage isn't from tuple operations, though, it's that it normalizes more expressions so that they all contain It's not clear to me how my proposal requires control flow analysis. The requirement that retval not persist across branches was designed to eliminate that. What did I overlook? The point about nesting two function calls is a good one. There are a number of scenarios where you'd still need temporaries, and maybe that alone makes it problematic enough to not be worthwhile. |
On the subject of deduplication: a compiler could take deduplication into account when assigning local variable indices. In the limit, it could effectively simulate You're right that the "control flow analysis" is super-simple given that it's intra-basic-block. I guess my spider senses were going off because it is a categorically different kind of validation criteria than the rest of our type-based validation which is only sensitive to the signature of ops and the types of child expressions. |
How would y'all feel about a variant of
After some thought I think in practice |
IIUC typed |
If we set aside the size/compression argument, then I think it's simpler and more regular not to introduce a new special kind of local with special semantics. If the only argument is size/compression, it seems like we need some motivating data and arguments why layer 1 (viz. macros) can't achieve the same effect. In particular, I think we could simulate |
It's clear that multiple return values need some addition to the AST (GeneralTuples) On one end is general tuples that are flattened by the (ScalarsOnly) On the extreme other end is no support at all for tuples, (OnlyLocalTuples) In the middle of the continuum, we could allow tuples (RetVal) is another solution in the middle. I see at as basically a I think we want to avoid (GeneralTuples). As I discussed above, it's a nice So without (GeneralTuples), we're left with (ScalarsOnly), I'd argue that (OnlyLocalTuples) has most of the same problems as That leaves (ScalarsOnly) and (RetVal), which is basically where this Luke and I seem to be leaning towards (ScalarsOnly) with the assumption StrawMan1: call_multiple %0 %1...%K [func] [arg0] [arg1]...[argN] The semantics would be to call the function [func] either directly or StrawMan2: call_multiple %1...%K [func] [arg0] [arg1]...[argN] The semantics would be to call the function [func] either directly or On Wed, Jul 29, 2015 at 9:46 AM, Luke Wagner notifications@github.com
|
I like StrawMan2. While I see the appeal of StrawMan1 in allowing all return values to be handled consistently as named variables, this can be achieved with StrawMan2 by adding a set_local: set_local %0, (call_multiple %1...%K [func] [arg0] [arg1]...[argN]) which is slightly less pretty but achieves the same effect. And, it avoids the awkwardness in StrawMan1 of producing a value into two output locations. |
Agreed |
FWIW, the ML prototype currently implements a simpler variation of (OnlyLocalTuples) that I would call (SecondClassTuples), and which doesn't have most of the problems @titzer mentions. In that approach, values or variables cannot have tuple type, only expressions can. In general, any expression produces a sequence of values, most of them unary or empty. Calls can produce n-ary expression type. Such types can be consumed by either a Destruct operator (an n-ary SetLocal), by Return, or if you want, by other Calls (as their argument). It is not possible to get a first-class handle on the tuple itself. Nor can tuples be nested, because their components are values. That's more modular in several ways: (1) it allows multiple ways of consuming tuples; (2) it also allows introducing additional expressions that produce n-ary results; (3) it allows instructions that are generic in arity. In particular, it doesn't require introducing a _multiple variant of each Call instruction (with the strawmen above, you'd seem to need to add at least both call_multiple and call_indirect_multiple?). |
On Fri, Aug 14, 2015 at 9:37 AM, rossberg-chromium <notifications@github.com
If expressions can produce tuples, it means that decoders have to track the I don't think we gain (or need) much expressive power by allowing tuples to
|
A couple of things you could gain in the future are e.g. primitive operators with multiple results (e.g. divmod), or potential forms of control transfer with multiple values (e.g. exceptions or a generalised Break). None of these are relevant for the MVP, but it would still be nice to have a factorisation that is somewhat future-proof (and avoids duplication of opcodes). As for decoding complications: true. Not sure that is too terrible, though. Moreover, it could be avoided (for most part) by disallowing multiple values to flow through other expressions (such as conditionals), in which case only the top of the stack could be n-ary. That leaves the option to generalise later. |
Definitely interested in multiple-return primitive values, but it seems like these ops could just do the same thing as The more interesting case as you point out is if we had generalized tuple return from control flow primitives. @sunfishcode and I were talking about this a long time ago in the context of reducing the high % of bytes spent on |
In #278, Dan suggested some good refinements for our thinking about calls and return values. Implied in this is the potential for multiple return values to show up in the future. I think it's relatively inevitable that multiple return values will creep their way in eventually, even if C/C++ never use them. In the event that they do, this has some nasty implications for our current model:
We need a way to represent the idea of an expression yielding multiple distinct values of different types.
We need a way to handle those value tuples: Assign them out into locals, etc.
A function needs a way to construct those value tuples so it can return them.
My suggestion for tackling this and some other issues is that we should define a distinct concept separate from local variables that I'll refer to as 'retval'. Effectively, retval is a tiny set of registers that contain the return value(s) of the most recent function call, with various restrictions to make it simple to reason about.
A rough sketch of how retval would work:
void
(no retvals) or has a single retval of typeT
.retval N
is valid so long as N < the number of return values from the previous function. The type of this expression depends on the signature of the most recent function call, and it evaluates to the contents of that numbered return value register.retval N
can only be used once per return value per invocation; it consumes the result. To use a return value multiple times, store it into a local.(call @a ...), (retval 0)
- essentially this is a simple transform that can happen in the compiler, decoder, or native runtime. I'd probably put it in the decoder.So, (complete strawman), existing code that looks like this:
would look something like this:
(EDIT) or this:
So, why? Various reasons:
call @a (call @b ...)
->call @b ...; call @a (retval 0)
).call @a (call @b)
is pretty easy to expand in the case that@b
returns say, 3 values:call @b; call @a (retval 0) (retval 1) (retval 2)
.I think this is especially important to think about, because many future languages that we (probably) want to target wasm have multiple return values or something equivalent. Go extensively uses multiple return values, and C# somewhat-extensively uses something roughly equivalent (
out
parameters).As far as returning multiple values from a function at once goes - I think we punt on that entirely until after the MVP, but this model probably makes sense there too (opcodes for storing values into your result registers, or something similar).
If people like this proposal I can try to draft up a PR for it later.
The text was updated successfully, but these errors were encountered: