-
Notifications
You must be signed in to change notification settings - Fork 696
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Indirect calls: how do they work? #89
Comments
Excellent questions! For now I have a thought on the first of them, that you made me realize. It seems dangerous to let implementations pick their own values. For one thing, they can just pick the actual pointer value as the simplest solution, which is unsafe. But probably no serious implementation would do that. However, a second concern is that it leaves more room for differences between implementations. Of course, no serious program should depend on such differences, but this seems a particularly risky area: we already have semantics that differ from many native platforms, namely that we don't allow calling a method (directly or not) with the wrong types. In practice, this is not as rare in the real world as we would hope. This makes for some fun bugs, and having function pointer values that change between implementations, or that change between runs on a single implementation, would compound the confusions. I would therefore suggest to specify the pointer values. Perhaps one of these:
|
The current proposal actually differs from normal Emscripten output (which will bake in integers literals directly instead of symbolically via |
On Sat, May 30, 2015 at 12:04 AM, Alon Zakai notifications@github.com
If we go with allowing function pointers (rather than separate
|
We should think carefully about how JIT would interact with this, since applications JITting functions are going to end up having a considerable number of function pointers. They might also want to jump into the middle of a JITted function. If it's a module/fn index pair that would potentially mean JITting code would require creating a new module for it (or perhaps it could work like LCG in .NET where you're appending the functions onto the end of an existing module.) |
@titzer: yes, an automatic numbering would need to take into account the module too. Perhaps it could be just a deterministic running counter that continues through shared modules as they are linked? The more I think about this, the startup overhead seems like a large issue, as it's common to have function pointers in the data segment. In shared modules it seems unavoidable to have to adjust those function pointers at runtime, but for a singleton non-shared-module, it seems like a shame to require that. An alternative might be to allow marking function pointers in the data segment somehow - "there is an AddressOf of function X here", which is what LLVM IR and PNaCl have, but it still leaves work for the VM at runtime. |
@titzer That's a nice idea but, assuming we allow coercion to/from @kg I think it's important that we don't scope-creep dynamic linking to support the full fine-grained code compilation/loading required by JITing. JIT support is already a separate feature on the list that we should address directly as it will likely require specific support anyway to be able to do normal things like ICs and patching. |
@jfbastien What are you thinking exactly? Maybe I misunderstand, but I don't see how a control flow sanitizer can be implemented anywhere other than in the Web Assembly runtime. (To make sure we're on the same page, my reference point is our control flow guard, but maybe that is completely different than what you are talking about). |
There are two uses for sanitizer in Web Assembly:
NaCl is an example of 1. in that it enforces control-flow integrity and data-flow integrity, but only to protect web users from untrusted code. PNaCl makes it possible to translate code to NaCl, MinSFI or an OS-sandbox only (bare-metal mode) by leaving unspecified what happens when a function call is invalid. We use these three approaches in different circumstances. I want to ensure that we can to the same in Web Assembly. I also want to allow developers to use sanitizers on their own code for 2., which is similar to turning on control flow guard. Doing this is basically running a sandbox within a sandbox (sanitizer within Web Assembly), and making the inner one efficient is a bit tricky: LLVM's control-flow sanitizer plays with function tables, and LLVM's address sanitizer uses shadow memory which requires |
On Thu, Jun 4, 2015 at 7:42 PM, JF Bastien notifications@github.com wrote:
|
The inner sandboxing is something developers choose to use because they want to protect against bugs in their own code, not in wasm, the same way applications currently use e.g. stack canaries: it's not to protect against compiler or OS bugs but to protect against bugs in their own code. |
On Thu, Jun 4, 2015 at 10:21 PM, JF Bastien notifications@github.com
|
The example in my mind of what I guessed @jfbastien is talking about is heap guards that a *San tool might insert. I agree we shouldn't need much to help us on the stack since it's trusted. |
UB in a developer's original C++ code isn't all detectable by wasm. @lukewagner has it exactly right: developers should be able to turn on the *san tools when they compile. wasm needs to help them a bit for this to work well ( In the specific case of indirect calls (what this issue is about): does wasm make it possible to efficiently implement control-flow sanitizer? The LLVM approach claims almost no overhead on Chromium, and I hope the same holds when targeting wasm. |
So what specifically are you interested in a control-flow sanitizer catching, given a trusted stack? All I can think of is some guards for the heap-stack. |
I only see the value of NaCl-style sandboxing as a second, added safeguard On Thu, Jun 4, 2015 at 10:51 PM, Luke Wagner notifications@github.com
|
A trusted stack in wasm doesn't protect an application from vtable clobber attacks which lead to a call to a function with the same signature but for the wrong object type. That's one example of where user-mode CFI, such as provided by pcc's work, is desirable. |
@jfbastien That's a good example. It'd be nice, in our docs if we say anything about CFI to explain what we mean above and beyond a trusted stack. |
Well, vtable pointers are part of the heap as part of the C++ object On Thu, Jun 4, 2015 at 10:57 PM, JF Bastien notifications@github.com
|
I assumed that the *San mode involves some extra type checking to make sure the callee's assumed object type matches |
@titzer: what @lukewagner said is exactly right, the cfi sanitizer adds extra code to LLVM bitcode (which then would become extra wasm code) to make sure vtable types match. |
#278 is a pull request which is my attempt to capture the present conclusions of this conversation. |
This has been resolved by #392. |
The current AST semantics document states:
IIUC this basically is what Emscripten does.
I'd like us to discuss this a bit more to make sure we consider alternatives before choosing a specific approach:
Anything else?
The text was updated successfully, but these errors were encountered: