-
Notifications
You must be signed in to change notification settings - Fork 696
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Refine function pointers #358
Conversation
What is the benefit of allowing a table to contain functions with different signatures? That seems like it puts a lot of burden on runtime validation. |
Implementing C++ vtables. On Thu, Sep 17, 2015 at 8:29 PM, Andrew Scheidecker <
|
That can be done with one homogeneous function table for each function in the vtable, and that can be statically validated. |
Compilers can choose to do that and, as described in this PR, engines should have no trouble detecting and compiling optimized code using just module header info. Another problem with homogeneous function tables is that some C++ code depends on function pointers with different signatures having unique indices and this forces Emscripten to (optionally) insert dummies into the function tables so that a given index is only valid for 1 table. This isn't necessarily a bad thing, but it can use a lot of memory and that's why this was mentioned as a possible engine-internal optimization in this PR. Even in the case where a dynamic signature check is used, signatures can be canonicalized to a single word that is checked with a single branch in the prologue. Thinking of how this would be implemented in Odin, I don't see any significant impl challenge. |
Thanks, it helps to know the motivation.
To check for equality by comparing untyped bits, right? You could accomplish that with a per-table index base; each function has a distinct index, and each function table defines a value for a dense range of indices that doesn't necessarily start at zero. I'm satisfied if I'll be able to generate homogeneous function tables and expect it to be fast, I just can't imagine an application that would choose dynamically checked indirect calls to save a few hundred KB of memory. |
You're right for the statically-linked case, but once you throw in dynamic linking (where you have to dynamically grow these function pointer tables), you lose all the density of the index spaces (in both the generic func ptr and MTable optimization schemes). Not only that, but these schemes both need to do sophisticated things for dynamic linking to work: a dylib's func tables can't simply be appended; they need to first inflate the original table (up to the total number of classes or address-taken functions) with throwing-stubs and then append the new dylib tables. In contrast, when heterogeneous tables are used, you can simply append and you're always dense. Now maybe we should just take this is an argument that we need to offer a lot more control over function tables (during dynamic linking or in general allowing arbitrary resizing/mutation)... but that seems like a lot more complexity so we'd need to justify it. From the measurements I've seen, an extra super-predictable branch in the prologue won't have measurable overhead, but I certainly would be open to seeing newer measurements (once our experimental implementations are farther along) to the contrary. |
I'm concerned that this design is much too constraining on implementations, and prevents us from trying out different approaches. I understand the will to reduce nondeterminism, but I'm not sure it's worth the cost here. I'm also not sure how this proposal will interact with dynamic linking. |
@jfbastien That's a pretty vague statement; do you have any specific designs in mind? |
@jfbastien Perhaps you want the function pointers to not be specified as an index into a dense table so that you have the flexibility to use real function pointers and some other sandbox approach, and that the tables and the indirection through them could also be optimized away? Can you really allow arbitrary jumps into the address space and still adequately sandbox execution? And how would this work for wasm32 on a 64 VM, without 64 bit function pointers? It sounds like there is a lot of flexibility in this area but that implementation experience needs to be shared. |
I think linking vtables is the only thing that prevents homogeneous function tables from working with dynamic linking (assuming an addressof operator instead of static function indices).
You're probably right in general. I wrote a specialized benchmark in WAVM, and measured a 1-22% cost to that branch. A large application is probably closer to 1%, but here's the 22% case: You can see how I changed the code generation here: WAVM/WAVM@a64dc27#diff-d8a76028ed4037dbcd7a4be4958af768R342 |
A minimalist design for indirect calls would omit explicit function tables. You have some operation that gives you a "pointer" to a statically indexed function as an I32, and you have call_indirect as it is now. The runtime is free to use whatever mapping between integers and functions it likes within some constraints. That works with dynamic linking, and can be implemented with homogeneous function tables, heterogeneous function tables, or even NaCL-style CFI. Then if you want fast vtable calls, provide a way to declare vtables as immutable global values. Provide an operation to get a "pointer" to a statically indexed vtable as an I32, and add a call_vtable that calls a static index of a dynamic vtable "pointer". It would be straightforward for a runtime to implement that along with dynamic linking through heterogeneous function tables, and with some additional work a runtime could limit the runtime checks to just the bounds check with some kind of MTable scheme. |
Thanks for experimenting! If I'm reading WithFunctionPrefixCheck.asm correctly, then you're doing the check before the call. What @titzer and I had discussed is having the caller pass a word (in a register) representing the caller's signature (index into a canonicalized array of signatures) and the callee branching on that word being equal to an immediate (the callee's signature's index) in the prologue. This should be significantly faster.
Even if we go the minimalist design route (roughly what's in AstSemantics.md#calls now, except removing this vague intermediate "function pointer value" and only dealing with ints), I think we should avoid leaving the integral values unspecified without evidence of significant speedup from the nondeterminism (just like in every other situation). In particular, I share @JSStats's concern that there isn't an efficient way to represent function pointers as raw function addresses without breaking wasm's agreed-upon goal of limited nondeterminism (which means, e.g., that you can't jump into the middle of a function). The best strategy I've heard proposed is maintaining a big mask that could be probed before jumping (to ensure jumps to raw addresses only land on actual function entry points), but I'd be surprised if this was any faster than the simple table dispatch (esp. when that table was in cache). OTOH, to assist implementations in using homogeneous tables under the hood, the spec could assign indices to functions in a signature-sorted order.
This is a good point, though, that we could consider "optimize vtable uses of function pointers" as a separate feature and that ops like |
It's not that simple. If table types are no longer homogeneous, then you In fact, even the dynamic semantics will become ambiguous. Consider:
You can no longer tell what the intermediate type is (the result of the So, to make this idea fly, |
@rossberg-chromium That's a good point, but |
Correct, I forgot to call that out. I did it that way because it's a bit more work to do it with a check in the prologue. I tried it though, and it does perform better in the test with predictable indirect calls. However, for my original test with unpredictable indirect calls it is 11% slower than the unchecked version, as well as the version that does the callsite check. I'm not familiar with how the details of how my CPU's branch predictor works (it's a Sandy Bridge if that tells you anything), but it makes sense to me that putting the easy to predict branch before the hard to predict branch would be faster than the other way around.
Yeah, I haven't thought of a better way than some variation of function tables. But you can do some interesting stuff with function tables if you have control over the indexing. I think you can probably minimize the memory cost of sparse function tables with careful indexing and sparsely committed virtual pages, for example.
That would be true of my proposal to make function tables implicit as well. In a binary encoding, you would have a signature table, and call_indirect would be parameterized by an index into that, so the text syntax could mimic that:
|
That doesn't surprise me for the noop function body being tested in WithPrologueCheck.asm, but you'd expect those and similarly-trivial functions to have been inlined. Incidentally, my experience here is that we have a dynamic stack-overflow check in Odin asm.js function prologues that I keep wondering whether we should get rid of (using signal handler tricks). I can see speedups in noop microbenchmark loops, but never in any sort of realistic workload. |
On 18 September 2015 at 15:29, Luke Wagner notifications@github.com wrote:
But that's not enough, you need the argument types as well, in order to |
@rossberg-chromium Oh, I was just thinking of bottom-up typing like asm.js. But yes, you're right, if we want to maintain the top-down property we'd need the signature index as @AndrewScheidecker already stated. The main problem I have with homogeneous tables is the unclear dynamic linking story where it seems like you have to inflate tables. For the minimalist (no explicit tables) or untyped table designs, I can see a fairly clear dynamic linking story. |
@lukewagner said:
As @AndrewScheidecker there's NaCl-style CFI, or any other CFI (see below for clang's sanitizer approach). The entire point is: over-constraining means we can't do anything else! I'm also not clear on what you mean with vtables, since they usually just live in the heap (so linear memory for wasm). Are you proposing abstracting them away, into non-addressable memory? That's a significant change that needs to be seriously spec'd! There's more than just vfunction pointers that go into the type information. @lukewagner said:
The performance overhead for this is less than 1% on Chromium: http://clang.llvm.org/docs/ControlFlowIntegrity.html This proposal constrains things too much to use a similar approach, IIUC. I'd like us to be able to try out a bunch of different approaches, and be able to split up our jump tables differently. |
It seems that in order to support "generic" function pointers in C/C++, the module must provide a single big table (or a series of them) with all functions in it whose address is taken, right? If a language has some form of reflection, perhaps any function's address might be taken, so the table(s) may include all functions, right? For C++ vtables, let me see if I understand the plan:
Right? But beyond simple C++ scenarios, I don't understand how vtables will work.
|
This argument does not generalize; we are completely specific throughout the entire wasm specification on purpose except when compelled by specific reasons. There is so far no evidence that anything other than table-based dispatch can be made faster while adhering to wasm's agreed-upon goals of limited, local nondeterminism (in particular, I don't think NaCl provides CFI since it allows jumping into the middle of a function; it's SFI and distinctly nonlimited, nonlocal nondeterminism).
Assuming you're referring to "vtables" in the PR (in the FAQ), no, this is something the user app would do, simply by choosing to place the vtable in the function table array. The spec doesn't know or care about "vtables".
That's not surprising but the burden of proof for weakening determinism isn't "doesn't make it worse", it has to be "makes it distinctly better". That is, if we weaken the semantics, we need to get measurable win; allowing different impl schemes is not a goal in and of itself. |
I feel the same way. I think restricting ourselves to one particular approach (with obvious downsides) is risky since we can end up paying the performance cost for years even if it turns out it was unnecessary. Can't we route around this for now with an
This is definitely something worth thinking about. Also, in dynamic-linking scenarios, someone who imported you can take the address of any of your exported functions. If we absolutely must compromise our design to support people comparing C++ function pointers of different types, I'd rather we do it with bitmasks or something, not by locking ourselves to a single heterogeneous table and needing to insert type checks at every call site or at the top of every function being called. (If the checks are at the top of functions, does that mean any function that can possibly have its address taken needs different codegen? Or are we generating a trampoline that does the check, and paying the cost of two jumps/calls?) |
It's not opaque; it implies a table scheme unless we weaken the semantics to say it returns "any integer". If we do specify the integer, that scheme is strictly more limiting than what is described in this PR. I think we need a compelling argument to not specify the integer.
That same dylib would add the export to its locally-declared func table, so it's really no different than the normal case.
That's not the motivation for having explicit function tables; that would work just fine in the minimalist design that's sorta in AstSemantics.md now. That's only a consideration when the wasm-generator uses multiple tables and thus gives different functions in different tables the same index.
The signature-checking prologue can simply fallthrough into the direct-call prologue. |
@titzer @pizlonator Any comment on this question of whether to allow nondeterministic function pointer indices (such that real function addresses (or some efficient transformation thereof) could be used as indices)? That appears to be the root of disagreement here. The secondary issue is whether we want to attempt to optimize the vtable pattern by allowing explicit function tables or just keep it simple by having a single, implicit table and requiring |
I think this PR qualifies for a migration to an issue, given the extent of the discussion. I'd like a clearer proposal w.r.t. vtables, and better justification on why this specific approach is the one we want (not just "nondeterminism", there are other ways to address nondeterminism, such as clang's CFI approach, and it has actual performance numbers). |
I second the move to an issue. I find the proposals somewhat difficult to follow without more detailed examples. |
On Fri, Sep 18, 2015 at 9:04 PM, Luke Wagner notifications@github.com
IMO we should state more clearly what the design parameters are here, and
|
@lukewagner Could the patch be updated to take into account some of the problems noted above. The main problem seems to be that the signature would be needed for each call - is the only option to reference this at the call operation? Would it be possible to also support explicit homogeneous function tables (known at compile time, rather than at runtime), with a dense index. So a function would have both a heterogeneous and a homogeneous index. I can see uses for homogeneous function tables where the application knows it is calling a particular class of function and does need them to have a unique index wrt functions with difference signatures. In this case the signature at the call operation could be omitted, and some memory might be saved. For example, if the function index is encapsulated in an object, and the app compares object pointers to test for equality. @titzer The function pointers need to be integers, to be able to store them in memory, so any pointer could be created. If a 'direct function pointer' is allowed then how is it verified? Do people have concrete schemes or are they just asking for more time to explore the space? |
On Mon, Sep 21, 2015 at 2:32 PM, JSStats notifications@github.com wrote:
The problem seems to be that comparison of C++ function pointers needs to
|
@titzer C++ might just need to use the heterogeneous function table index then, and emit signatures at each call operation, and have an extra signature check at the callee. But other languages might not have this restriction, so it need not be a core limitation of wasm, hence the request to also support homogeneous function tables. I guess 'direct' function pointers might have some use as a local type, thanks for explaining this. I guess this would include function arguments and results which might make them even more useful. So code might obtain a 'direct' function pointer by indexing into a function table, could pass it on to other functions or return it, and possibly efficiently map it back to it's index. I just can't think of any hot code that would need this - all the hot code that comes to mind is obtaining functions from memory, from objects or tables. |
On Mon, Sep 21, 2015 at 3:48 PM, JSStats notifications@github.com wrote:
|
Something I'm increasingly confused about: When we say function tables do we mean each module has a table containing all its functions, with an indirect call stub for each (that does the type check &c)? So then dynamic linking would imply gluing all those tables together. And then scenarios where new function pointers are introduced would be runtime dynamic linking and JIT, right? So in those cases there would be extra space allocated at the end of the table with more stubs inserted in. When we're talking about vtables or other forms of compiler-driven/compiler-assisted indirection, those tables contain pointers to known functions within the module. If those are indices, we're implicitly promising that function indices at runtime will always be the same. We need to do relocation on all of the function pointer values in a module at runtime in dynamic linking scenarios, then, because it's not possible for every module to get the indices it was promised. Right now we don't have If you can take the address of an imported function (if not, why not?) then that introduces another interaction - you need some sort of mechanism to statically (at load time/link time) go from import id to the appropriate index into the imported module's function table. I'm not sure how this would work, but applications will definitely do it. If I can take the address of an imported function, can I take the address of an imported host function? If not, how do I determine whether or not a given function can have its address taken? |
@kg The text looks rather vague. It looks like it wants to retain the ability to define 'any number of indirectly-callable function tables and these tables can contain an arbitrary sequence of functions, identified by their index in the global function table.' Note sure what this means. How can these have a dense index while also being 'identified by their index in the global function table' which is heterogeneous? It looks like an attempt to handle dynamic linking. That each new function would get a new (next) index in an instance heterogeneous function table. Perhaps a new operation will be added such as lookup_heterogeneous_function_index(string) that will return the index, and heterogeneous_function(index) to reference the function as an argument to a call operation. So long as each heterogeneous function index is unique then tables that store the index will not need to change these indexes when dynamic linking. Perhaps just one core table should be defined for the heterogeneous functions per instance, rather than requiring the wasm core to support any number of function tables, and wasm code should allocate its own vtables or caches etc and take responsibility for maintaining them. |
(Sorry for the despondency; away for conference.) It seems like, before we can even discuss the different explicit function table strategies, we need to get to the bottom of this fundamental issue of whether we're trying to nondeterministically abstract over whether the integer function pointers are dense table indices or (some efficient transformation of) function addresses. If 'yes', that fundamentally prevents us from specifying explicit function tables. I'd be happy discussing that question in a new, focused issue and putting this discussion on hold until that is resolved. |
Subsumed by and moved to #392. |
This PR captures a trail of discussions and recent converging consensus (#278, #89, IRL) about function pointers in the MVP and, after, with dynamic linking and GC (without attempting to fully define the latter two). Incidentally, this change almost matches what's in ml-proto (which is based on what's in v8-native-proto) if you drop the same-signature requirement on tables.