Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

RFC: CFI Improvements with PAuth and BTI #17

Merged
merged 1 commit into from
Apr 20, 2022

Conversation

akirilov-arm
Copy link
Contributor

This RFC proposes to improve control flow integrity for compiled WebAssembly code by utilizing two technologies from the Arm instruction set architecture - Pointer Authentication and Branch Target Identification.

Rendered

Copy link
Member

@cfallin cfallin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@akirilov-arm I'm really excited about the prospect of having working pointer auth and CFI -- thank you for this effort in that direction!

A few thoughts:

  • It might be good to add a little more detail here to describe proposed changes to the generated code. In particular more detail on the pointer-auth would be helpful, just to document our approach if nothing else.

  • Do either of these features (especially pointer-auth) affect the ABI (I don't actually know)? If so, do we need to somehow declare a new ABI variant? I guess this also intersects with the question of how CFI-enabled and non-CFI-enabled code interact within a single process; describing how this looks (JIT code is enhanced as described here but runtime is not built with these features) would be helpful too.

accepted/cfi-improvements-with-pauth-and-bti.md Outdated Show resolved Hide resolved
accepted/cfi-improvements-with-pauth-and-bti.md Outdated Show resolved Hide resolved
accepted/cfi-improvements-with-pauth-and-bti.md Outdated Show resolved Hide resolved
@akirilov-arm
Copy link
Contributor Author

@cfallin Thanks for the feedback! Concerning your general questions:

* It might be good to add a little more detail here to describe proposed changes to the generated code. In particular more detail on the pointer-auth would be helpful, just to document our approach if nothing else.

Sure, I can use a simple CLIF function that just calls another subroutine and returns immediately after that as an example, and present the generated code before and after the CFI enhancements.

In the meantime, the article that is mentioned in the text actually provides that level of detail, to help with the discussion before I update the proposal.

* Do either of these features (especially pointer-auth) affect the ABI (I don't actually know)?

The scheme that I have described in the proposal should be mostly backward compatible, that is any part of the program should be able to remain oblivious of whether some code it interacts with uses PAuth and/or BTI or not. The major exceptions are unwinders, but at least the system ones should be prepared to deal with PAuth (assuming that the software environment the application runs in is relatively recent and that the suggested DWARF changes are implemented), as bytecodealliance/wasmtime#3183 has demonstrated in an unpleasant way. Of course, I can add a short paragraph that summarizes the backward compatibility aspect to the proposal.

However, it is possible to devise more sophisticated CFI schemes that require the introduction of specialized ABIs and that have a potentially higher performance overhead - for example, Apple's own variant. IMHO we should start with something simple that will allow us to work out the interactions with the rest of the system, while providing a reasonable improvement to CFI.

@fitzgen
Copy link
Member

fitzgen commented Nov 1, 2021

It seems like supporting pointer authentication and BTI is largely an implementation detail of the aarch64 backend, and doesn't require new clif instructions or larger compiler/runtime changes (other than support for configuring aarch64-specific codegen options). Is my understanding correct here?

The proposed implementation will add the PACIASP instruction to the beginning of every function compiled by Cranelift and would replace the final return with the RETAA instruction.

Two questions:

  • Should we have support for both the nop versions of these instructions and the non-nop versions? (RETA{A,B} are non-nop IIUC.) And should we expose whether to use the non-nop versions as a cranelift settings option?

  • Similarly, should we support both the a key and b key? Does userspace always use a by convention? If so, then we can probably get away without the option to generate b.

These steps can be skipped for simple leaf functions that do not construct frame records on the stack.

LLVM has an option to generate the pointer auth for leaf functions, do we want to as well?

@akirilov-arm
Copy link
Contributor Author

I think @alexcrichton asked how the Rust compiler planned to use PAuth and BTI during one of the Cranelift meetings - details are in rust-lang/rust#88354; the changes seem to follow what GCC and LLVM are already doing.

As for @fitzgen's questions:

It seems like supporting pointer authentication and BTI is largely an implementation detail of the aarch64 backend, and doesn't require new clif instructions or larger compiler/runtime changes (other than support for configuring aarch64-specific codegen options). Is my understanding correct here?

Yes, it is, but the main reason I have posted a RFC is to discuss the higher-level points such as whether the proposed hardening makes sense and what the extent of the threat model should be, e.g. we may decide that a CFI mechanism that requires ABI changes, while being beyond the scope of the current proposal, is something that we should definitely consider in the future. The discussion should also inform a potential x86 implementation using CET.

Should we have support for both the nop versions of these instructions and the non-nop versions?

Keeping in mind that PAuth can't be enabled in a granular way (e.g. with IFUNC or a similar mechanism), but it has to be used globally in all functions in order to be effective, the main utility of the NOP versions of the instructions is to ease deployment and in particular to support the Linux distribution use case, where the OS vendors would like to ship a single set of binaries that would still be hardened on compatible hardware. I believe that this is not really relevant to Wasmtime, so the current proposal minimizes the size of the generated code, but I might be wrong, hence I added the open question - please, chime in if that's indeed the case.

However, if we consider the cg_clif use case, then we might in fact need to support both versions of the instructions - the NOP variants that would be used by Cranelift as a Rust compiler backend and the rest for Wasmtime. @bjorn3, since you are driving the former use case, I would appreciate any comments.

Similarly, should we support both the a key and b key? Does userspace always use a by convention?

AFAIK there isn't a convention per se and applications are free to use both keys, but most software is probably going to use the A key because it is the default for static compilers such as GCC. However, one option that has been discussed in the context of programs that use JIT compilation (e.g. dynamic language runtimes) is to use the B key for generated functions and the A key for statically compiled code.

Also, the same interoperability considerations with the Rust compiler apply here.

LLVM has an option to generate the pointer auth for leaf functions, do we want to as well?

I suppose you mean all leaf functions as opposed to those that are not affected by the optimization introduced by bytecodealliance/wasmtime#2960, which is what the text currently proposes (does LLVM make that distinction)? Again, that would mostly be about being able to accommodate whatever the Rust compiler decides to do.

@tschneidereit
Copy link
Member

However, if we consider the cg_clif use case, then we might in fact need to support both versions of the instructions - the NOP variants that would be used by Cranelift as a Rust compiler backend and the rest for Wasmtime.

I wonder if this might also be relevant for scenarios where wasmtime compile is used, and the resulting .cwasm file deployed to other machines. While it seems highly advisable to ensure those deployment scenarios involve pretty tight control over the target configuration, it still seems like there might be value in enabling the .cwasm file to run on a broader range of hardware.

@bnjbvr
Copy link
Member

bnjbvr commented Dec 1, 2021

Similarly, should we support both the a key and b key? Does userspace always use a by convention?

AFAIK there isn't a convention per se and applications are free to use both keys, but most software is probably going to use the A key because it is the default for static compilers such as GCC. However, one option that has been discussed in the context of programs that use JIT compilation (e.g. dynamic language runtimes) is to use the B key for generated functions and the A key for statically compiled code.

Some relevant documentation about the choice of using the A or B keys for signing pointers, according to LLVM's (maybe Apple's LLVM fork in particular) documentation of pointer authentication: https://github.com/apple/llvm-project/blob/next/clang/docs/PointerAuthentication.rst#key-assignments

In particular, I do confirm seeing B key be used for signing return addresses on Mac M1.

If I understand correctly, for the particular case of the return address at least, the selected key has to match what the host system has chosen too. When unwinding the stack on Mac M1 (when creating a backtrace for an error, for instance), the system's libunwind tried to authenticate all return addresses with the B key, including return addresses that had been stored in registers/stack by Cranelift-generated code. Does that sound accurate? If so, would that require having low-level settings that controlled which key is used in which particular context (e.g. "use the B key for signing return addresses"), or is it something that can be configured during unwinding through the use of CFI directives?

@akirilov-arm
Copy link
Contributor Author

... according to LLVM's (maybe Apple's LLVM fork in particular) documentation of pointer authentication...

To be precise - this concerns arm64e, Apple's pointer authentication ABI variant, which is incompatible with its predecessor, while the proposal here discusses an approach that interoperates mostly transparently with vanilla AAPCS64.

However, thanks for pointing that out because it means that we may need to do something different when targetting macOS. I have to admit that my previous answer (and, indeed, the proposal itself) has been written mostly with Linux in mind.

... for the particular case of the return address at least, the selected key has to match what the host system has chosen too. When unwinding the stack on Mac M1 [...], the system's libunwind tried to authenticate all return addresses with the B key, including return addresses that had been stored in registers/stack by Cranelift-generated code. Does that sound accurate?

Yes, it makes sense.

If so, would that require having low-level settings that controlled which key is used in which particular context (e.g. "use the B key for signing return addresses"), or is it something that can be configured during unwinding through the use of CFI directives?

Well, considering the Linux/DWARF case, the key could actually be specified in the CIE via a CIE augmentation string.

@akirilov-arm
Copy link
Contributor Author

akirilov-arm commented Dec 16, 2021

@tschneidereit

I wonder if this might also be relevant for scenarios where wasmtime compile is used, and the resulting .cwasm file deployed to other machines. While it seems highly advisable to ensure those deployment scenarios involve pretty tight control over the target configuration, it still seems like there might be value in enabling the .cwasm file to run on a broader range of hardware.

While I was writing the proposal, my assumption was that even if Wasmtime compiled a Wasm module ahead-of-time, the machine on which the generated code executed would usually have the same (or better) capabilities than those enabled during compilation. How important is the opposite use case? @cfallin also expressed some reservations about the overhead of the additional instructions, so machines without PAuth support would probably rather avoid them.

Also, this discussion might be relevant in the context of forward-edge CFI using Intel CET, I believe.

P.S. What we have been talking about so far is essentially a forward compatibility issue with PAuth, but after experimenting with BTI I have realized that there is another, backward compatibility problem - imagine that a Wasm module is compiled ahead-of-time without any BTI support, so the generated code will not contain any BTI instructions, and then a machine that does support BTI tries to execute the code. The latter will mark the executable memory pages as containing BTI instructions, which will cause crashes because all indirect branches will fault due to the missing instructions. The solution to this issue for static compilers involves a couple of ELF extensions, so we could use something similar for the .cwasm files generated by Wasmtime. Alternatively, if we agree that we do not support the case in which the hardware capabilities in the compilation environment differ from those in the execution environment, then we would not need to do anything special.

@akirilov-arm
Copy link
Contributor Author

During the last Cranelift project meeting @alexcrichton mentioned that Wasmtime would error out if it tried to load an AOT compiled module that was built with different settings than the ones used by the current environment, which means that supporting the NOP versions of the instructions is not that important, at least for the Wasmtime use case.

Improve control flow integrity for compiled WebAssembly code
by utilizing two technologies from the Arm instruction set
architecture - Pointer Authentication and Branch Target
Identification.

Copyright (c) 2021, Arm Limited.
@akirilov-arm
Copy link
Contributor Author

I have updated the proposal text to align it with the equivalent work on the Rust compiler; also, the changes should address the comments from the discussion above.

There is a suggestion that @abrown made elsewhere - since BTI and the ENDBR instruction introduced by Intel CET are quite similar, it makes sense to consider a generic setting to control their usage. Of course, the question is what would targets that don't have special hardware support for an equivalent mechanism (such as s390x, I believe) do - one option is to just return an error if the generic setting is enabled, while another is to implement a software-only protection such as the one I discuss briefly in the proposal text. Currently, I have chosen to keep the setting controlling BTI usage to be AArch64-specific.

@akirilov-arm
Copy link
Contributor Author

akirilov-arm commented Apr 6, 2022

Motion to finalize with a disposition to merge

As discussed during the last Cranelift biweekly meeting, I believe that all the comments on the text so far have been addressed, and there hasn't been any strong opposition, so I'm proposing that we merge this RFC.

Thanks everyone for the discussion and the feedback!

Stakeholders sign-off

Arm

DFINITY

Embark Studios

Fastly

Fermyon

Google / Envoy

IBM

Intel

Microsoft

Mozilla

wasmCloud

Unaffiliated

@akirilov-arm
Copy link
Contributor Author

akirilov-arm commented Apr 7, 2022

Entering Final Comment Period

Now that we have sign-offs from multiple separate groups, as per the process this RFC will move into a final comment period (FCP).

The FCP will end on Monday, April 18.

@akirilov-arm
Copy link
Contributor Author

The FCP has ended without any objections raised and without further discussion, so it is time to merge this RFC proposal. Thanks everyone for the discussion!

@akirilov-arm akirilov-arm merged commit 2fdeb0a into bytecodealliance:main Apr 20, 2022
@akirilov-arm akirilov-arm deleted the cfi_improvements branch April 20, 2022 14:48
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

9 participants