Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Flesh out calls, indirect calls, and function pointers. #278

Closed
wants to merge 1 commit into from
Closed
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
91 changes: 60 additions & 31 deletions AstSemantics.md
Original file line number Diff line number Diff line change
Expand Up @@ -34,9 +34,9 @@ a trap occurs.

## Types

### Local Types
### Basic Types

The following types are called the *local types*:
The following types are called the *basic types*:

* `int32`: 32-bit integer
* `int64`: 64-bit integer
Expand All @@ -47,15 +47,21 @@ Note that the local types `int32` and `int64` are not inherently signed or
unsigned. The interpretation of these types is determined by individual
operations.

Parameters and local variables use local types.
Also note that there is no need for a `void` type; function signatures use
[sequences of types](Calls.md) to describe their return values, so a `void`
return type is represented as an empty sequence.

### Local Types

### Expression Types
*Local types* are a superset of the basic types, adding the following:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd like this document to explain why there's no void type. I'd also like to understand what a function does return if it doesn't return anything in source form. Maybe this can be in the "calls" section, when you explain that a function can return 0 elements?


*Expression types* include all the local types, and also:
* `funcid`: a function identifier for use in `call_indirect`

* `void`: no value
The zero value of `funcid` is the identifier for the first function in the
function table. (C/C++ compilers may wish to put a placeholder function at
this point in the table to implement a null pointer concept.)

AST expression nodes use expression types.
Parameters and local variables use local types.

### Memory Types

Expand Down Expand Up @@ -293,39 +299,53 @@ may be added in the future.

## Calls

Direct calls to a function specify the callee by index into a function table.
Each function has a *signature*, which consists of:

* `call_direct`: call function directly
* Return types, which are a sequence of local types
* Argument types, which are a sequence of local types

Each function has a signature in terms of expression types, and calls must match
the function signature
exactly. [Imported functions](MVP.md#code-loading-and-imports) also have
signatures and are added to the same function table and are thus also callable
via `call_direct`.
Note that WebAssembly itself does not support variable-length argument lists
(aka varargs). C and C++ compilers are expected to implement this functionality
by storing arguments in a buffer in linear memory and passing a pointer to the
buffer.

Indirect calls may be made to a value of function-pointer type. A
function-pointer value may be obtained for a given function as specified by its index
in the function table.
In the MVP, the length of the return types vector may only be 0 or 1. This
restriction may be lifted in the future with the addition of support for
[multiple return values](FutureFeatures.md#multiple-return-values).

There are two forms of calls:

* `call_direct`: call function directly
* `call_indirect`: call function indirectly
* `addressof`: obtain a function pointer value for a given function

Function-pointer values are comparable for equality and the `addressof` operator
is monomorphic. Function-pointer values can be explicitly coerced to and from
integers (which, in particular, is necessary when loading/storing to memory
since memory only provides integer types). For security and safety reasons,
the integer value of a coerced function-pointer value is an abstract index and
does not reveal the actual machine code address of the target function.
Direct calls identify their function statically. Indirect calls have a
`funcid` operand which identifies the function at runtime.

In the MVP, function pointer values are local to a single module. The
Calls have a signature, which is the expected return types and argument types
(ignoring the `funcid` operand, in the case of `call_indirect`) of the
AST node. Call operations trap if the signature of the call differs from the
signature of the called function.

### Function pointers
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we also discuss pointer-to-member-function? Itanium has done weird things that I think we want to avoid?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's an ABI concern too.


Function pointer values are obtained through the use of a special operator:

* `addressof`: obtain a `funcid` value for a given statically-identified function

and are comparable for equality:

* `funcid.eq`: function identifier compare equal

Note that it is not possible to directly observe the bits of a `funcid`
value. They may be [converted into integers][], but the integers only hold an
index into the *function table*, a table with an entry for each function
appended to the table in the order that they are loaded into the program.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

32 or 64 bit integers? Later text says 32.

Are these deterministic? e.g. Emscripten has one table per signature IIRC? LLVM has some experimental CFI that does this too, I'd like to leave the door open to performance / security diversity here.

Does this have any bearings on dynamic linking (cc @dschuff)?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Per the discussions in #89, there's one table, not one table per signature, and:

Call operations trap if the signature of the call differs from the signature of the called function

The order is deterministic if the application loads functions into the program in a deterministic fashion:

appended to the table in the order that they are loaded into the program

Also, I don't expect this PR is the last word on function pointers. This is just trying to update the text to where the current discussions have lead and clean up the text to facilitate the next rounds of discussion.

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

On Thu, Jul 23, 2015 at 8:43 PM, Dan Gohman notifications@github.com
wrote:

In AstSemantics.md
#278 (comment):

-In the MVP, function pointer values are local to a single module. The
+### Function pointers
+
+Function pointer values are obtained through the use of a special operator:
+

  • * addressof: obtain a funcptr value for a given statically-identified function

+and are comparable for equality:
+

  • * funcptr.eq: funcptr compare equal

+Note that it is not possible to directly observe the bits of a funcptr
+value. They may be [converted into integers][], but the integers only hold an
+index into the function table, a table with an entry for each function
+appended to the table in the order that they are loaded into the program.

There's a basic design decision to be made somewhere about whether we
accept a dynamic signature check at calls/decodes or whether we try to
design a static mechanism. The addition of local types (that include
signatures) for function pointers only makes sense if we can remove the
need for most, if not all dynamic checks.

Per the discussions in #89
#89, there's one table, not
one table per signature, and:

Call operations trap if the signature of the call differs from the
signature of the called function

That implies a dynamic signature check, which I thought this proposal was
trying to avoid.

The order is deterministic if the application loads functions into the
program in a deterministic fashion:

appended to the table in the order that they are loaded into the program

Also, I don't expect this PR is the last word on function pointers. This
is just trying to update the text to where the current discussions have
lead and clean up the text to facilitate the next rounds of discussion.

The idea of using multiple function tables, with one per signature, is to
avoid the need for a dynamic signature check and also the need for function
types to be local types, since the program only ever manipulates indices.
To be clear, I am not sure if that's a better solution, since it will
complicate dynamic linking, but it does have those advantages.

A key issue really is how C++ will encode vtables into wasm. Per-signature
function tables can work in an mtable scenario, where the vtable is
represented as a class number and the class number is used to index into
per-virtual-method tables. That requires numbering the classes in a
specific manner, but is dense and single-load efficient.


Reply to this email directly or view it on GitHub
https://github.com/WebAssembly/design/pull/278/files#r35356329.


In the MVP, `funcid` values are local to a single module. The
[dynamic linking](FutureFeatures.md#dynamic-linking) feature is necessary for
two modules to pass function pointers back and forth.
two modules to pass `funcid` values back and forth.

Multiple return value calls will be possible, though possibly not in the
MVP. The details of multiple-return-value calls needs clarification. Calling a
function that returns multiple values will likely have to be a statement that
specifies multiple local variables to which to assign the corresponding return
values.
[converted into integers]: AstSemantics.md#datatype-conversions-truncations-reinterpretations-promotions-and-demotions

## Literals

Expand Down Expand Up @@ -507,6 +527,8 @@ is NaN, and *ordered* otherwise.
* `float64.cvt_unsigned[int32]`: convert an unsigned 32-bit integer to a 64-bit float
* `float64.cvt_unsigned[int64]`: convert an unsigned 64-bit integer to a 64-bit float
* `float64.reinterpret[int64]`: reinterpret the bits of a 64-bit integer as a 64-bit float
* `funcid.decode[int32]` : convert an unsigned 32-bit integer to a function identifier
* `int32.encode` : convert a function identifier to an unsigned 32-bit integer

Wrapping and extension of integer values always succeed.
Promotion and demotion of floating point values always succeed.
Expand All @@ -523,3 +545,10 @@ round-to-nearest ties-to-even rounding.
Truncation from floating point to integer where IEEE-754 would specify an
invalid operation exception (e.g. when the floating point value is NaN or
outside the range which rounds to an integer in range) traps.

Encoding a `funcid` returns the index into the function table. If the index of
the function is too great to fit in the result type, encoding traps. Decoding
returns the `funcid` from an encoded function index. If the index is out of
bounds in the function table, decoding traps. In the MVP, `funcid` values may
only be converted to and from 32-bit integers. Support for 64-bit funcid may be
added in the future.
34 changes: 34 additions & 0 deletions FutureFeatures.md
Original file line number Diff line number Diff line change
Expand Up @@ -393,3 +393,37 @@ reason we haven't added these already is that they're not efficient for
general-purpose use on several of today's popular hardware architectures.

[handle trap specially]: FutureFeatures.md#trapping-or-non-trapping-strategies

## Multiple Return Values

In the MVP, functions and operators are limited to having at most one return
value. It is desirable to lift this restriction in the future.

Currently, when the result of an operator is used more than once, it must be
stored in a local variable. Multiple return values could generalize this rule
to require when an operator has multiple result values, the result values must
be all stored in local variables too. This suggests a new form of `set_local`
which can directly assign the results of a multiple-result operator into
multiple local variables, and a new form of `return` which can return multiple
values from a function. An advantage of this approach is avoiding the need
for tuple values to be live anywhere.

One interesting question is whether it should be possible to have a function
call another function which returns multiple result values, and then forward
all of those result values to its own result, without copying them all into
local variables first. It gets even more interesting with the discussion of
[statements versus expressions](https://github.com/WebAssembly/design/issues/223).

## C++-style vtable optimizations

Storing function pointers in memory requires converting them into function table
indices. Since the traditional implementation of C++ virtual functions is to
have vtables stored as arrays in the address space, a traditional implementation
of C++ virtual functions in WebAssembly would work, but would require an extra
table lookup.

However, because vtable objects are not exposed at the C++ level, it isn't
actually necessary to represent them inside the address space of the program. If
WebAssembly allowed modules to define alternate function tables, C++ compilers
could lower vtables into these alternate tables, so their contents could be
actual function pointers instead of function table indices.
2 changes: 1 addition & 1 deletion PostMVP.md
Original file line number Diff line number Diff line change
Expand Up @@ -38,7 +38,7 @@ variables are not aliasable and thus allow more aggressive optimization.
Support fixed-width SIMD vectors, initially only for 128-bit wide vectors as
demonstrated in [PNaCl's SIMD][] and [SIMD.js][].

SIMD adds new primitive variable and expression types (e.g., `float32x4`) so it
SIMD adds new local types (e.g., `float32x4`) so it
has to be part of the core semantics. SIMD operations (e.g., `float32x4.add`)
could be either builtin operations (no different from `int32.add`) or exports of
a builtin SIMD module.
Expand Down