Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fault on heap accesses around 0 #204

Open
jfbastien opened this issue Jun 19, 2015 · 15 comments
Open

Fault on heap accesses around 0 #204

jfbastien opened this issue Jun 19, 2015 · 15 comments

Comments

@jfbastien
Copy link
Member

@titzer suggested that "a module may optionally define that out-of-bounds includes small indices close to 0". I like the idea, and I think we should figure out the details of how this works. C/C++ certainly care about this features, but other languages may not so it may be interesting to let the .wasm file control this.

  1. Should WebAssembly implementations be allowed to do something, and be required to document what they do?
  2. Should it be a property in the .wasm file?
  3. Something we specify through ELF?
  4. Should it be done manually through mprotect, by the developer? Note that toolchains could do this automatically when generating _start.

In all cases, should we mandate that implementations provide information on smallest page size, so that this can be efficiently implemented through memory protection?

@jfbastien jfbastien added this to the MVP milestone Jun 19, 2015
@lukewagner
Copy link
Member

This is something C++ devs (myself included) really want. I agree that many uses may not want this so I've been assuming it would be a module-level opt-in flag.

I think the ideal is we make this part of the semantics, but currently I don't see a strategy that has full performance if the engine isn't able to use memory-protect+signal-handling tricks to guard the low memory region. (The best I can think of requires an extra sub before the bounds check.) Assuming full performance w/o signal handlers is a tier-1 use case, then I think this feature belongs better as a developer tool. It would still be nice to have the module flag, though, so that a browser's devtools could have a persistent "fault on low-memory access" checkbox that a developer would leave on and so they would always be catching these bugs, not just when they chose to open the Web console.

@jfbastien
Copy link
Member Author

I like the mprotect approach :-)

@lukewagner
Copy link
Member

Do you mean in the engine impl or exposing mprotect as a primitive to wasm? If the latter and the engine didn't have signal handler support, it seems like we'd need the engine to maintain a shadow memory (1 bit or byte per page) that was probed before every access.

@jfbastien
Copy link
Member Author

I mean 4. that I wrote above. I don't think we need shadow memory for it: we can limit what parameters we allow to mprotect for now, and define accesses to non read or write memory to trap. When we add signal handling we can do more, in this case raise the segfault.

@lukewagner
Copy link
Member

@jfbastien Right, but how would that be implemented in the engine without either signal-handler support or some shadow memory bookkeeping?

That being said, most platforms do seem to have pretty good signal support; I'm mostly recalling a concern of @pizlonator with making JSC embeddable. Filip: have you considered catching faults at the Mach exception layer? That's what we had to do to catch SIGSEGVs before breakpad killed the process; specifically, listening to the thread-level exception ports lets you get first dibs before all the Unix signals run and my impression from the general lack of documentation is that few devs use Mach exceptions directly.

@jfbastien
Copy link
Member Author

Hmm, the polyfill may be an issue!

I also hadn't thought about engines which can't just use signals... That'll bite them once wasm does support signals, so maybe it's fine to address this issue now?

@titzer
Copy link

titzer commented Jun 19, 2015

On Fri, Jun 19, 2015 at 2:29 AM, JF Bastien notifications@github.com
wrote:

@titzer https://github.com/titzer suggested that "a module may
optionally define that out-of-bounds includes small indices close to 0".
I like the idea, and I think we should figure out the details of how this
works. C/C++ certainly care about this features, but other languages may
not so it may be interesting to let the .wasm file control this.

To be fair, I just transferred a sentence from below to this paragraph for
readability.

  1. Should WebAssembly implementations be allowed to do something, and
    be required to document what they do?
  2. Should it be a property in the .wasm file?
  3. Something we specify through ELF?
  4. Should it be done manually through mprotect, by the developer? Note
    that toolchains could do this automatically when generating _start.

Another possibility: what if we just check the base address for 0 (null)
first?

We could combine all memory accesses into one kind of addressing mode that
takes 3 integer parameters like so:

load/store base, offset, disp

Where "base" is an int32/int64 expression, "offset" is a int32 literal, and
"disp" is an int32/int64 expression. The effective address would be
calculated as "intW_add(base, intW_add(offset, disp))", while the bounds
check would first check base != 0 and then "integer_add(intW_add(base,
disp), offset) < heap_size", with the latter potentially achieving the
bounds check hoisting Luke wanted.

I didn't think through the details of whether the two checks could be
combined into one machine instruction, but clearly the "null check" of
"base != 0" will often be loop invariant and thus hoistable.

This requires that all heap accesses have three operands, which is OK if we
have macro encoding for common cases of 0 offset or displacement.

In all cases, should we mandate that implementations provide information
on smallest page size, so that this can be efficiently implemented through
memory protection?


Reply to this email directly or view it on GitHub
#204.

@pizlonator
Copy link
Contributor

On Jun 19, 2015, at 8:04 AM, Luke Wagner notifications@github.com wrote:

@jfbastien https://github.com/jfbastien Right, but how would that be implemented without either signal-handler support or some shadow memory bookkeeping?

That being said, most platforms do seem to have pretty good signal support; I'm mostly recalling a concern of @pizlonator https://github.com/pizlonator with making JSC embeddable. Filip: have you considered catching faults at the Mach exception layer?

Yes. But I also care about this not being too awful in embedding scenarios on Linux and Windows, since WebKit and JSC do run on those.

It’s also a philosophical concern. I prefer for webasm to be specified in such a way that you can create a performant implementation without relying on the OS.

That's what we had to do http://hg.mozilla.org/mozilla-central/file/2694ff2ace6a/js/src/asmjs/AsmJSSignalHandlers.cpp#l1040 to catch SIGSEGVs before breakpad killed the process; specifically, listening to the thread-level exception ports lets you get first dibs before all the Unix signals run and my impression from the general lack of documentation is that few devs use Mach exceptions directly.

Yeah, the way to program with Mach is to try to find sample code and then just sort of do what that does. ;-) It’s true that this is what we would do, if webasm required us to do it.

-Filip

@pizlonator
Copy link
Contributor

On Jun 19, 2015, at 7:26 AM, Luke Wagner notifications@github.com wrote:

This is something C++ devs (myself included) really want. I agree that many uses may not want this so I've been assuming it would be a module-level opt-in flag.

I think the ideal is we make this part of the semantics, but currently I don't see a strategy that has full performance if the engine isn't able to use memory-protect+signal-handling tricks to guard the low memory region. (The best I can think of requires an extra sub before the bounds check.)

Right. And, interestingly, if you have to do:

if (unsigned(ptr - C1) >= C2 - C1)

Then the performance ends up being roughly the same as doing:

if (ptr < C1 || ptr >= C2)

Because the subtract is one cycle, but so is a second compare/branch that the CPU can fuse. Either way, adding one cycle is probably not great.

Assuming full performance w/o signal handlers is a tier-1 use case, then I think this feature belongs better as a developer tool.

I agree.
It would still be nice to have the module flag, though, so that a browser's devtools could have a persistent "fault on low-memory access" checkbox that a developer would leave on and so they would always be catching these bugs, not just when they chose to open the Web console.


Reply to this email directly or view it on GitHub #204 (comment).

@pizlonator
Copy link
Contributor

On Jun 19, 2015, at 12:00 PM, titzer notifications@github.com wrote:

On Fri, Jun 19, 2015 at 2:29 AM, JF Bastien notifications@github.com
wrote:

@titzer https://github.com/titzer suggested that "a module may
optionally define that out-of-bounds includes small indices close to 0".
I like the idea, and I think we should figure out the details of how this
works. C/C++ certainly care about this features, but other languages may
not so it may be interesting to let the .wasm file control this.

To be fair, I just transferred a sentence from below to this paragraph for
readability.

  1. Should WebAssembly implementations be allowed to do something, and
    be required to document what they do?
  2. Should it be a property in the .wasm file?
  3. Something we specify through ELF?
  4. Should it be done manually through mprotect, by the developer? Note
    that toolchains could do this automatically when generating _start.

Another possibility: what if we just check the base address for 0 (null)
first?

We could combine all memory accesses into one kind of addressing mode that
takes 3 integer parameters like so:

load/store base, offset, disp

Where "base" is an int32/int64 expression, "offset" is a int32 literal, and
"disp" is an int32/int64 expression. The effective address would be
calculated as "intW_add(base, intW_add(offset, disp))", while the bounds
check would first check base != 0 and then "integer_add(intW_add(base,
disp), offset) < heap_size", with the latter potentially achieving the
bounds check hoisting Luke wanted.

I didn't think through the details of whether the two checks could be
combined into one machine instruction, but clearly the "null check" of
"base != 0" will often be loop invariant and thus hoistable.

That’s true, but this significantly complicates things just for faulting on zero. Faulting on zero feels too much like a developer feature.

This requires that all heap accesses have three operands, which is OK if we
have macro encoding for common cases of 0 offset or displacement.

But then we risk some wasm generators abusing the 0-offset encoding and using normal math to compute offsets.

In all cases, should we mandate that implementations provide information
on smallest page size, so that this can be efficiently implemented through
memory protection?


Reply to this email directly or view it on GitHub
#204.


Reply to this email directly or view it on GitHub #204 (comment).

@sunfishcode
Copy link
Member

Since we're planning on having mprotect, that functionality should suffice for this feature.

@jfbastien
Copy link
Member Author

So the consensus is that MVP will not fault on nullptr deref?

@ghost
Copy link

ghost commented Oct 22, 2015

The difference between this issue and mprotect might be that this issue deals with a restriction enforced by the runtime, not protection setup by the wasm code itself. Operation systems often impose similar restrictions on applications. This issue might also help support placing the linear memory at absolute zero in the address space which has been shown to have significant performance benefits.

@ghost
Copy link

ghost commented Oct 22, 2015

I'd like to see the runtime being able to pass a start_of_usable_memory and for any access below this to optionally fault on MVP. If only to future proof MVP targeted applications. I believe this would address this issue too.

@sunfishcode
Copy link
Member

@jfbastien Once we have mprotect, that'll be able to do exactly what we want here, so it's desirable to avoid adding a separate mechanism that will then become obsolete.

@JSStats - Moving linear memory to absolute zero in the virtual address space can be discussed in #334.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

5 participants