RFC: CFI Improvements with PAuth and BTI #17

akirilov-arm · 2021-10-26T13:07:54Z

This RFC proposes to improve control flow integrity for compiled WebAssembly code by utilizing two technologies from the Arm instruction set architecture - Pointer Authentication and Branch Target Identification.

Rendered

cfallin

@akirilov-arm I'm really excited about the prospect of having working pointer auth and CFI -- thank you for this effort in that direction!

A few thoughts:

It might be good to add a little more detail here to describe proposed changes to the generated code. In particular more detail on the pointer-auth would be helpful, just to document our approach if nothing else.
Do either of these features (especially pointer-auth) affect the ABI (I don't actually know)? If so, do we need to somehow declare a new ABI variant? I guess this also intersects with the question of how CFI-enabled and non-CFI-enabled code interact within a single process; describing how this looks (JIT code is enhanced as described here but runtime is not built with these features) would be helpful too.

accepted/cfi-improvements-with-pauth-and-bti.md

akirilov-arm · 2021-10-26T18:01:53Z

@cfallin Thanks for the feedback! Concerning your general questions:

* It might be good to add a little more detail here to describe proposed changes to the generated code. In particular more detail on the pointer-auth would be helpful, just to document our approach if nothing else.

Sure, I can use a simple CLIF function that just calls another subroutine and returns immediately after that as an example, and present the generated code before and after the CFI enhancements.

In the meantime, the article that is mentioned in the text actually provides that level of detail, to help with the discussion before I update the proposal.

* Do either of these features (especially pointer-auth) affect the ABI (I don't actually know)?

The scheme that I have described in the proposal should be mostly backward compatible, that is any part of the program should be able to remain oblivious of whether some code it interacts with uses PAuth and/or BTI or not. The major exceptions are unwinders, but at least the system ones should be prepared to deal with PAuth (assuming that the software environment the application runs in is relatively recent and that the suggested DWARF changes are implemented), as bytecodealliance/wasmtime#3183 has demonstrated in an unpleasant way. Of course, I can add a short paragraph that summarizes the backward compatibility aspect to the proposal.

However, it is possible to devise more sophisticated CFI schemes that require the introduction of specialized ABIs and that have a potentially higher performance overhead - for example, Apple's own variant. IMHO we should start with something simple that will allow us to work out the interactions with the rest of the system, while providing a reasonable improvement to CFI.

fitzgen · 2021-11-01T17:40:39Z

It seems like supporting pointer authentication and BTI is largely an implementation detail of the aarch64 backend, and doesn't require new clif instructions or larger compiler/runtime changes (other than support for configuring aarch64-specific codegen options). Is my understanding correct here?

The proposed implementation will add the PACIASP instruction to the beginning of every function compiled by Cranelift and would replace the final return with the RETAA instruction.

Two questions:

Should we have support for both the nop versions of these instructions and the non-nop versions? (RETA{A,B} are non-nop IIUC.) And should we expose whether to use the non-nop versions as a cranelift settings option?
Similarly, should we support both the a key and b key? Does userspace always use a by convention? If so, then we can probably get away without the option to generate b.

These steps can be skipped for simple leaf functions that do not construct frame records on the stack.

LLVM has an option to generate the pointer auth for leaf functions, do we want to as well?

akirilov-arm · 2021-12-01T10:58:53Z

I think @alexcrichton asked how the Rust compiler planned to use PAuth and BTI during one of the Cranelift meetings - details are in rust-lang/rust#88354; the changes seem to follow what GCC and LLVM are already doing.

As for @fitzgen's questions:

It seems like supporting pointer authentication and BTI is largely an implementation detail of the aarch64 backend, and doesn't require new clif instructions or larger compiler/runtime changes (other than support for configuring aarch64-specific codegen options). Is my understanding correct here?

Yes, it is, but the main reason I have posted a RFC is to discuss the higher-level points such as whether the proposed hardening makes sense and what the extent of the threat model should be, e.g. we may decide that a CFI mechanism that requires ABI changes, while being beyond the scope of the current proposal, is something that we should definitely consider in the future. The discussion should also inform a potential x86 implementation using CET.

Should we have support for both the nop versions of these instructions and the non-nop versions?

Keeping in mind that PAuth can't be enabled in a granular way (e.g. with IFUNC or a similar mechanism), but it has to be used globally in all functions in order to be effective, the main utility of the NOP versions of the instructions is to ease deployment and in particular to support the Linux distribution use case, where the OS vendors would like to ship a single set of binaries that would still be hardened on compatible hardware. I believe that this is not really relevant to Wasmtime, so the current proposal minimizes the size of the generated code, but I might be wrong, hence I added the open question - please, chime in if that's indeed the case.

However, if we consider the cg_clif use case, then we might in fact need to support both versions of the instructions - the NOP variants that would be used by Cranelift as a Rust compiler backend and the rest for Wasmtime. @bjorn3, since you are driving the former use case, I would appreciate any comments.

Similarly, should we support both the a key and b key? Does userspace always use a by convention?

AFAIK there isn't a convention per se and applications are free to use both keys, but most software is probably going to use the A key because it is the default for static compilers such as GCC. However, one option that has been discussed in the context of programs that use JIT compilation (e.g. dynamic language runtimes) is to use the B key for generated functions and the A key for statically compiled code.

Also, the same interoperability considerations with the Rust compiler apply here.

LLVM has an option to generate the pointer auth for leaf functions, do we want to as well?

I suppose you mean all leaf functions as opposed to those that are not affected by the optimization introduced by bytecodealliance/wasmtime#2960, which is what the text currently proposes (does LLVM make that distinction)? Again, that would mostly be about being able to accommodate whatever the Rust compiler decides to do.

tschneidereit · 2021-12-01T11:52:36Z

However, if we consider the cg_clif use case, then we might in fact need to support both versions of the instructions - the NOP variants that would be used by Cranelift as a Rust compiler backend and the rest for Wasmtime.

I wonder if this might also be relevant for scenarios where wasmtime compile is used, and the resulting .cwasm file deployed to other machines. While it seems highly advisable to ensure those deployment scenarios involve pretty tight control over the target configuration, it still seems like there might be value in enabling the .cwasm file to run on a broader range of hardware.

bnjbvr · 2021-12-01T13:34:34Z

Similarly, should we support both the a key and b key? Does userspace always use a by convention?

AFAIK there isn't a convention per se and applications are free to use both keys, but most software is probably going to use the A key because it is the default for static compilers such as GCC. However, one option that has been discussed in the context of programs that use JIT compilation (e.g. dynamic language runtimes) is to use the B key for generated functions and the A key for statically compiled code.

Some relevant documentation about the choice of using the A or B keys for signing pointers, according to LLVM's (maybe Apple's LLVM fork in particular) documentation of pointer authentication: https://github.com/apple/llvm-project/blob/next/clang/docs/PointerAuthentication.rst#key-assignments

In particular, I do confirm seeing B key be used for signing return addresses on Mac M1.

If I understand correctly, for the particular case of the return address at least, the selected key has to match what the host system has chosen too. When unwinding the stack on Mac M1 (when creating a backtrace for an error, for instance), the system's libunwind tried to authenticate all return addresses with the B key, including return addresses that had been stored in registers/stack by Cranelift-generated code. Does that sound accurate? If so, would that require having low-level settings that controlled which key is used in which particular context (e.g. "use the B key for signing return addresses"), or is it something that can be configured during unwinding through the use of CFI directives?

akirilov-arm · 2021-12-02T20:09:27Z

... according to LLVM's (maybe Apple's LLVM fork in particular) documentation of pointer authentication...

To be precise - this concerns arm64e, Apple's pointer authentication ABI variant, which is incompatible with its predecessor, while the proposal here discusses an approach that interoperates mostly transparently with vanilla AAPCS64.

However, thanks for pointing that out because it means that we may need to do something different when targetting macOS. I have to admit that my previous answer (and, indeed, the proposal itself) has been written mostly with Linux in mind.

... for the particular case of the return address at least, the selected key has to match what the host system has chosen too. When unwinding the stack on Mac M1 [...], the system's libunwind tried to authenticate all return addresses with the B key, including return addresses that had been stored in registers/stack by Cranelift-generated code. Does that sound accurate?

Yes, it makes sense.

If so, would that require having low-level settings that controlled which key is used in which particular context (e.g. "use the B key for signing return addresses"), or is it something that can be configured during unwinding through the use of CFI directives?

Well, considering the Linux/DWARF case, the key could actually be specified in the CIE via a CIE augmentation string.

akirilov-arm · 2021-12-16T17:15:18Z

@tschneidereit

I wonder if this might also be relevant for scenarios where wasmtime compile is used, and the resulting .cwasm file deployed to other machines. While it seems highly advisable to ensure those deployment scenarios involve pretty tight control over the target configuration, it still seems like there might be value in enabling the .cwasm file to run on a broader range of hardware.

While I was writing the proposal, my assumption was that even if Wasmtime compiled a Wasm module ahead-of-time, the machine on which the generated code executed would usually have the same (or better) capabilities than those enabled during compilation. How important is the opposite use case? @cfallin also expressed some reservations about the overhead of the additional instructions, so machines without PAuth support would probably rather avoid them.

Also, this discussion might be relevant in the context of forward-edge CFI using Intel CET, I believe.

P.S. What we have been talking about so far is essentially a forward compatibility issue with PAuth, but after experimenting with BTI I have realized that there is another, backward compatibility problem - imagine that a Wasm module is compiled ahead-of-time without any BTI support, so the generated code will not contain any BTI instructions, and then a machine that does support BTI tries to execute the code. The latter will mark the executable memory pages as containing BTI instructions, which will cause crashes because all indirect branches will fault due to the missing instructions. The solution to this issue for static compilers involves a couple of ELF extensions, so we could use something similar for the .cwasm files generated by Wasmtime. Alternatively, if we agree that we do not support the case in which the hardware capabilities in the compilation environment differ from those in the execution environment, then we would not need to do anything special.

akirilov-arm · 2022-01-20T16:38:52Z

During the last Cranelift project meeting @alexcrichton mentioned that Wasmtime would error out if it tried to load an AOT compiled module that was built with different settings than the ones used by the current environment, which means that supporting the NOP versions of the instructions is not that important, at least for the Wasmtime use case.

Improve control flow integrity for compiled WebAssembly code by utilizing two technologies from the Arm instruction set architecture - Pointer Authentication and Branch Target Identification. Copyright (c) 2021, Arm Limited.

akirilov-arm · 2022-03-08T16:48:04Z

I have updated the proposal text to align it with the equivalent work on the Rust compiler; also, the changes should address the comments from the discussion above.

There is a suggestion that @abrown made elsewhere - since BTI and the ENDBR instruction introduced by Intel CET are quite similar, it makes sense to consider a generic setting to control their usage. Of course, the question is what would targets that don't have special hardware support for an equivalent mechanism (such as s390x, I believe) do - one option is to just return an error if the generic setting is enabled, while another is to implement a software-only protection such as the one I discuss briefly in the proposal text. Currently, I have chosen to keep the setting controlling BTI usage to be AArch64-specific.

akirilov-arm · 2022-04-06T11:05:44Z

akirilov-arm · 2022-04-07T21:23:01Z

Entering Final Comment Period

Now that we have sign-offs from multiple separate groups, as per the process this RFC will move into a final comment period (FCP).

The FCP will end on Monday, April 18.

akirilov-arm · 2022-04-20T14:46:48Z

The FCP has ended without any objections raised and without further discussion, so it is time to merge this RFC proposal. Thanks everyone for the discussion!

cfallin reviewed Oct 26, 2021

View reviewed changes

accepted/cfi-improvements-with-pauth-and-bti.md Outdated Show resolved Hide resolved

accepted/cfi-improvements-with-pauth-and-bti.md Outdated Show resolved Hide resolved

accepted/cfi-improvements-with-pauth-and-bti.md Outdated Show resolved Hide resolved

akirilov-arm force-pushed the cfi_improvements branch from 749d230 to 0d5eca3 Compare November 29, 2021 15:59

akirilov-arm mentioned this pull request Dec 16, 2021

Initial back-edge CFI implementation bytecodealliance/wasmtime#3606

Merged

akirilov-arm mentioned this pull request Jan 14, 2022

Initial forward-edge CFI implementation bytecodealliance/wasmtime#3693

Merged

akirilov-arm mentioned this pull request Mar 8, 2022

Compiled Module loading should check isa flags against host cpu bytecodealliance/wasmtime#3897

Closed

RFC: CFI Improvements with PAuth and BTI

ba9ad9c

Improve control flow integrity for compiled WebAssembly code by utilizing two technologies from the Arm instruction set architecture - Pointer Authentication and Branch Target Identification. Copyright (c) 2021, Arm Limited.

akirilov-arm force-pushed the cfi_improvements branch from 0d5eca3 to ba9ad9c Compare March 8, 2022 16:20

akirilov-arm added project-cranelift project-wasmtime labels Mar 8, 2022

akirilov-arm added motion-to-finalize disposition-merge labels Apr 6, 2022

bjorn3 approved these changes Apr 6, 2022

View reviewed changes

PiotrSikora approved these changes Apr 6, 2022

View reviewed changes

akirilov-arm added the final-comment-period label Apr 7, 2022

radu-matei approved these changes Apr 8, 2022

View reviewed changes

akirilov-arm merged commit 2fdeb0a into bytecodealliance:main Apr 20, 2022

akirilov-arm deleted the cfi_improvements branch April 20, 2022 14:48

cfallin mentioned this pull request May 4, 2022

Mitigating Spectre attacks bytecodealliance/wasmtime#1032

Open

akirilov-arm mentioned this pull request Jun 1, 2022

CFI improvements to the AArch64 fiber implementation bytecodealliance/wasmtime#4195

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

RFC: CFI Improvements with PAuth and BTI #17

RFC: CFI Improvements with PAuth and BTI #17

akirilov-arm commented Oct 26, 2021

cfallin left a comment

akirilov-arm commented Oct 26, 2021

fitzgen commented Nov 1, 2021

akirilov-arm commented Dec 1, 2021

tschneidereit commented Dec 1, 2021

bnjbvr commented Dec 1, 2021

akirilov-arm commented Dec 2, 2021

akirilov-arm commented Dec 16, 2021 •

edited

Loading

akirilov-arm commented Jan 20, 2022

akirilov-arm commented Mar 8, 2022

akirilov-arm commented Apr 6, 2022 •

edited

Loading

akirilov-arm commented Apr 7, 2022 •

edited

Loading

akirilov-arm commented Apr 20, 2022

RFC: CFI Improvements with PAuth and BTI #17

RFC: CFI Improvements with PAuth and BTI #17

Conversation

akirilov-arm commented Oct 26, 2021

cfallin left a comment

Choose a reason for hiding this comment

akirilov-arm commented Oct 26, 2021

fitzgen commented Nov 1, 2021

akirilov-arm commented Dec 1, 2021

tschneidereit commented Dec 1, 2021

bnjbvr commented Dec 1, 2021

akirilov-arm commented Dec 2, 2021

akirilov-arm commented Dec 16, 2021 • edited Loading

akirilov-arm commented Jan 20, 2022

akirilov-arm commented Mar 8, 2022

akirilov-arm commented Apr 6, 2022 • edited Loading

Motion to finalize with a disposition to merge

Stakeholders sign-off

Arm

DFINITY

Embark Studios

Fastly

Fermyon

Google / Envoy

IBM

Intel

Microsoft

Mozilla

wasmCloud

Unaffiliated

akirilov-arm commented Apr 7, 2022 • edited Loading

Entering Final Comment Period

akirilov-arm commented Apr 20, 2022

akirilov-arm commented Dec 16, 2021 •

edited

Loading

akirilov-arm commented Apr 6, 2022 •

edited

Loading

akirilov-arm commented Apr 7, 2022 •

edited

Loading