Skip to content

Commit

Permalink
Merge tag 'x86_shstk_for_6.6-rc1' of git://git.kernel.org/pub/scm/lin…
Browse files Browse the repository at this point in the history
…ux/kernel/git/tip/tip

Pull x86 shadow stack support from Dave Hansen:
 "This is the long awaited x86 shadow stack support, part of Intel's
  Control-flow Enforcement Technology (CET).

  CET consists of two related security features: shadow stacks and
  indirect branch tracking. This series implements just the shadow stack
  part of this feature, and just for userspace.

  The main use case for shadow stack is providing protection against
  return oriented programming attacks. It works by maintaining a
  secondary (shadow) stack using a special memory type that has
  protections against modification. When executing a CALL instruction,
  the processor pushes the return address to both the normal stack and
  to the special permission shadow stack. Upon RET, the processor pops
  the shadow stack copy and compares it to the normal stack copy.

  For more information, refer to the links below for the earlier
  versions of this patch set"

Link: https://lore.kernel.org/lkml/20220130211838.8382-1-rick.p.edgecombe@intel.com/
Link: https://lore.kernel.org/lkml/20230613001108.3040476-1-rick.p.edgecombe@intel.com/

* tag 'x86_shstk_for_6.6-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (47 commits)
  x86/shstk: Change order of __user in type
  x86/ibt: Convert IBT selftest to asm
  x86/shstk: Don't retry vm_munmap() on -EINTR
  x86/kbuild: Fix Documentation/ reference
  x86/shstk: Move arch detail comment out of core mm
  x86/shstk: Add ARCH_SHSTK_STATUS
  x86/shstk: Add ARCH_SHSTK_UNLOCK
  x86: Add PTRACE interface for shadow stack
  selftests/x86: Add shadow stack test
  x86/cpufeatures: Enable CET CR4 bit for shadow stack
  x86/shstk: Wire in shadow stack interface
  x86: Expose thread features in /proc/$PID/status
  x86/shstk: Support WRSS for userspace
  x86/shstk: Introduce map_shadow_stack syscall
  x86/shstk: Check that signal frame is shadow stack mem
  x86/shstk: Check that SSP is aligned on sigreturn
  x86/shstk: Handle signals for shadow stack
  x86/shstk: Introduce routines modifying shstk
  x86/shstk: Handle thread shadow stack
  x86/shstk: Add user-mode shadow stack support
  ...
  • Loading branch information
torvalds committed Aug 31, 2023
2 parents b97d64c + 1fe428d commit df57721
Show file tree
Hide file tree
Showing 118 changed files with 2,790 additions and 308 deletions.
1 change: 1 addition & 0 deletions Documentation/arch/x86/index.rst
Original file line number Diff line number Diff line change
Expand Up @@ -22,6 +22,7 @@ x86-specific Documentation
mtrr
pat
intel-hfi
shstk
iommu
intel_txt
amd-memory-encryption
Expand Down
179 changes: 179 additions & 0 deletions Documentation/arch/x86/shstk.rst
Original file line number Diff line number Diff line change
@@ -0,0 +1,179 @@
.. SPDX-License-Identifier: GPL-2.0
======================================================
Control-flow Enforcement Technology (CET) Shadow Stack
======================================================

CET Background
==============

Control-flow Enforcement Technology (CET) covers several related x86 processor
features that provide protection against control flow hijacking attacks. CET
can protect both applications and the kernel.

CET introduces shadow stack and indirect branch tracking (IBT). A shadow stack
is a secondary stack allocated from memory which cannot be directly modified by
applications. When executing a CALL instruction, the processor pushes the
return address to both the normal stack and the shadow stack. Upon
function return, the processor pops the shadow stack copy and compares it
to the normal stack copy. If the two differ, the processor raises a
control-protection fault. IBT verifies indirect CALL/JMP targets are intended
as marked by the compiler with 'ENDBR' opcodes. Not all CPU's have both Shadow
Stack and Indirect Branch Tracking. Today in the 64-bit kernel, only userspace
shadow stack and kernel IBT are supported.

Requirements to use Shadow Stack
================================

To use userspace shadow stack you need HW that supports it, a kernel
configured with it and userspace libraries compiled with it.

The kernel Kconfig option is X86_USER_SHADOW_STACK. When compiled in, shadow
stacks can be disabled at runtime with the kernel parameter: nousershstk.

To build a user shadow stack enabled kernel, Binutils v2.29 or LLVM v6 or later
are required.

At run time, /proc/cpuinfo shows CET features if the processor supports
CET. "user_shstk" means that userspace shadow stack is supported on the current
kernel and HW.

Application Enabling
====================

An application's CET capability is marked in its ELF note and can be verified
from readelf/llvm-readelf output::

readelf -n <application> | grep -a SHSTK
properties: x86 feature: SHSTK

The kernel does not process these applications markers directly. Applications
or loaders must enable CET features using the interface described in section 4.
Typically this would be done in dynamic loader or static runtime objects, as is
the case in GLIBC.

Enabling arch_prctl()'s
=======================

Elf features should be enabled by the loader using the below arch_prctl's. They
are only supported in 64 bit user applications. These operate on the features
on a per-thread basis. The enablement status is inherited on clone, so if the
feature is enabled on the first thread, it will propagate to all the thread's
in an app.

arch_prctl(ARCH_SHSTK_ENABLE, unsigned long feature)
Enable a single feature specified in 'feature'. Can only operate on
one feature at a time.

arch_prctl(ARCH_SHSTK_DISABLE, unsigned long feature)
Disable a single feature specified in 'feature'. Can only operate on
one feature at a time.

arch_prctl(ARCH_SHSTK_LOCK, unsigned long features)
Lock in features at their current enabled or disabled status. 'features'
is a mask of all features to lock. All bits set are processed, unset bits
are ignored. The mask is ORed with the existing value. So any feature bits
set here cannot be enabled or disabled afterwards.

arch_prctl(ARCH_SHSTK_UNLOCK, unsigned long features)
Unlock features. 'features' is a mask of all features to unlock. All
bits set are processed, unset bits are ignored. Only works via ptrace.

arch_prctl(ARCH_SHSTK_STATUS, unsigned long addr)
Copy the currently enabled features to the address passed in addr. The
features are described using the bits passed into the others in
'features'.

The return values are as follows. On success, return 0. On error, errno can
be::

-EPERM if any of the passed feature are locked.
-ENOTSUPP if the feature is not supported by the hardware or
kernel.
-EINVAL arguments (non existing feature, etc)
-EFAULT if could not copy information back to userspace

The feature's bits supported are::

ARCH_SHSTK_SHSTK - Shadow stack
ARCH_SHSTK_WRSS - WRSS

Currently shadow stack and WRSS are supported via this interface. WRSS
can only be enabled with shadow stack, and is automatically disabled
if shadow stack is disabled.

Proc Status
===========
To check if an application is actually running with shadow stack, the
user can read the /proc/$PID/status. It will report "wrss" or "shstk"
depending on what is enabled. The lines look like this::

x86_Thread_features: shstk wrss
x86_Thread_features_locked: shstk wrss

Implementation of the Shadow Stack
==================================

Shadow Stack Size
-----------------

A task's shadow stack is allocated from memory to a fixed size of
MIN(RLIMIT_STACK, 4 GB). In other words, the shadow stack is allocated to
the maximum size of the normal stack, but capped to 4 GB. In the case
of the clone3 syscall, there is a stack size passed in and shadow stack
uses this instead of the rlimit.

Signal
------

The main program and its signal handlers use the same shadow stack. Because
the shadow stack stores only return addresses, a large shadow stack covers
the condition that both the program stack and the signal alternate stack run
out.

When a signal happens, the old pre-signal state is pushed on the stack. When
shadow stack is enabled, the shadow stack specific state is pushed onto the
shadow stack. Today this is only the old SSP (shadow stack pointer), pushed
in a special format with bit 63 set. On sigreturn this old SSP token is
verified and restored by the kernel. The kernel will also push the normal
restorer address to the shadow stack to help userspace avoid a shadow stack
violation on the sigreturn path that goes through the restorer.

So the shadow stack signal frame format is as follows::

|1...old SSP| - Pointer to old pre-signal ssp in sigframe token format
(bit 63 set to 1)
| ...| - Other state may be added in the future


32 bit ABI signals are not supported in shadow stack processes. Linux prevents
32 bit execution while shadow stack is enabled by the allocating shadow stacks
outside of the 32 bit address space. When execution enters 32 bit mode, either
via far call or returning to userspace, a #GP is generated by the hardware
which, will be delivered to the process as a segfault. When transitioning to
userspace the register's state will be as if the userspace ip being returned to
caused the segfault.

Fork
----

The shadow stack's vma has VM_SHADOW_STACK flag set; its PTEs are required
to be read-only and dirty. When a shadow stack PTE is not RO and dirty, a
shadow access triggers a page fault with the shadow stack access bit set
in the page fault error code.

When a task forks a child, its shadow stack PTEs are copied and both the
parent's and the child's shadow stack PTEs are cleared of the dirty bit.
Upon the next shadow stack access, the resulting shadow stack page fault
is handled by page copy/re-use.

When a pthread child is created, the kernel allocates a new shadow stack
for the new thread. New shadow stack creation behaves like mmap() with respect
to ASLR behavior. Similarly, on thread exit the thread's shadow stack is
disabled.

Exec
----

On exec, shadow stack features are disabled by the kernel. At which point,
userspace can choose to re-enable, or lock them.
1 change: 1 addition & 0 deletions Documentation/filesystems/proc.rst
Original file line number Diff line number Diff line change
Expand Up @@ -566,6 +566,7 @@ encoded manner. The codes are the following:
mt arm64 MTE allocation tags are enabled
um userfaultfd missing tracking
uw userfaultfd wr-protect tracking
ss shadow stack page
== =======================================

Note that there is no guarantee that every flag and associated mnemonic will
Expand Down
12 changes: 10 additions & 2 deletions Documentation/mm/arch_pgtable_helpers.rst
Original file line number Diff line number Diff line change
Expand Up @@ -46,7 +46,11 @@ PTE Page Table Helpers
+---------------------------+--------------------------------------------------+
| pte_mkclean | Creates a clean PTE |
+---------------------------+--------------------------------------------------+
| pte_mkwrite | Creates a writable PTE |
| pte_mkwrite | Creates a writable PTE of the type specified by |
| | the VMA. |
+---------------------------+--------------------------------------------------+
| pte_mkwrite_novma | Creates a writable PTE, of the conventional type |
| | of writable. |
+---------------------------+--------------------------------------------------+
| pte_wrprotect | Creates a write protected PTE |
+---------------------------+--------------------------------------------------+
Expand Down Expand Up @@ -118,7 +122,11 @@ PMD Page Table Helpers
+---------------------------+--------------------------------------------------+
| pmd_mkclean | Creates a clean PMD |
+---------------------------+--------------------------------------------------+
| pmd_mkwrite | Creates a writable PMD |
| pmd_mkwrite | Creates a writable PMD of the type specified by |
| | the VMA. |
+---------------------------+--------------------------------------------------+
| pmd_mkwrite_novma | Creates a writable PMD, of the conventional type |
| | of writable. |
+---------------------------+--------------------------------------------------+
| pmd_wrprotect | Creates a write protected PMD |
+---------------------------+--------------------------------------------------+
Expand Down
8 changes: 8 additions & 0 deletions arch/Kconfig
Original file line number Diff line number Diff line change
Expand Up @@ -931,6 +931,14 @@ config HAVE_ARCH_HUGE_VMALLOC
config ARCH_WANT_HUGE_PMD_SHARE
bool

# Archs that want to use pmd_mkwrite on kernel memory need it defined even
# if there are no userspace memory management features that use it
config ARCH_WANT_KERNEL_PMD_MKWRITE
bool

config ARCH_WANT_PMD_MKWRITE
def_bool TRANSPARENT_HUGEPAGE || ARCH_WANT_KERNEL_PMD_MKWRITE

config HAVE_ARCH_SOFT_DIRTY
bool

Expand Down
2 changes: 1 addition & 1 deletion arch/alpha/include/asm/pgtable.h
Original file line number Diff line number Diff line change
Expand Up @@ -256,7 +256,7 @@ extern inline int pte_young(pte_t pte) { return pte_val(pte) & _PAGE_ACCESSED;
extern inline pte_t pte_wrprotect(pte_t pte) { pte_val(pte) |= _PAGE_FOW; return pte; }
extern inline pte_t pte_mkclean(pte_t pte) { pte_val(pte) &= ~(__DIRTY_BITS); return pte; }
extern inline pte_t pte_mkold(pte_t pte) { pte_val(pte) &= ~(__ACCESS_BITS); return pte; }
extern inline pte_t pte_mkwrite(pte_t pte) { pte_val(pte) &= ~_PAGE_FOW; return pte; }
extern inline pte_t pte_mkwrite_novma(pte_t pte){ pte_val(pte) &= ~_PAGE_FOW; return pte; }
extern inline pte_t pte_mkdirty(pte_t pte) { pte_val(pte) |= __DIRTY_BITS; return pte; }
extern inline pte_t pte_mkyoung(pte_t pte) { pte_val(pte) |= __ACCESS_BITS; return pte; }

Expand Down
2 changes: 1 addition & 1 deletion arch/arc/include/asm/hugepage.h
Original file line number Diff line number Diff line change
Expand Up @@ -21,7 +21,7 @@ static inline pmd_t pte_pmd(pte_t pte)
}

#define pmd_wrprotect(pmd) pte_pmd(pte_wrprotect(pmd_pte(pmd)))
#define pmd_mkwrite(pmd) pte_pmd(pte_mkwrite(pmd_pte(pmd)))
#define pmd_mkwrite_novma(pmd) pte_pmd(pte_mkwrite_novma(pmd_pte(pmd)))
#define pmd_mkdirty(pmd) pte_pmd(pte_mkdirty(pmd_pte(pmd)))
#define pmd_mkold(pmd) pte_pmd(pte_mkold(pmd_pte(pmd)))
#define pmd_mkyoung(pmd) pte_pmd(pte_mkyoung(pmd_pte(pmd)))
Expand Down
2 changes: 1 addition & 1 deletion arch/arc/include/asm/pgtable-bits-arcv2.h
Original file line number Diff line number Diff line change
Expand Up @@ -87,7 +87,7 @@

PTE_BIT_FUNC(mknotpresent, &= ~(_PAGE_PRESENT));
PTE_BIT_FUNC(wrprotect, &= ~(_PAGE_WRITE));
PTE_BIT_FUNC(mkwrite, |= (_PAGE_WRITE));
PTE_BIT_FUNC(mkwrite_novma, |= (_PAGE_WRITE));
PTE_BIT_FUNC(mkclean, &= ~(_PAGE_DIRTY));
PTE_BIT_FUNC(mkdirty, |= (_PAGE_DIRTY));
PTE_BIT_FUNC(mkold, &= ~(_PAGE_ACCESSED));
Expand Down
2 changes: 1 addition & 1 deletion arch/arm/include/asm/pgtable-3level.h
Original file line number Diff line number Diff line change
Expand Up @@ -202,7 +202,7 @@ static inline pmd_t pmd_##fn(pmd_t pmd) { pmd_val(pmd) op; return pmd; }

PMD_BIT_FUNC(wrprotect, |= L_PMD_SECT_RDONLY);
PMD_BIT_FUNC(mkold, &= ~PMD_SECT_AF);
PMD_BIT_FUNC(mkwrite, &= ~L_PMD_SECT_RDONLY);
PMD_BIT_FUNC(mkwrite_novma, &= ~L_PMD_SECT_RDONLY);
PMD_BIT_FUNC(mkdirty, |= L_PMD_SECT_DIRTY);
PMD_BIT_FUNC(mkclean, &= ~L_PMD_SECT_DIRTY);
PMD_BIT_FUNC(mkyoung, |= PMD_SECT_AF);
Expand Down
2 changes: 1 addition & 1 deletion arch/arm/include/asm/pgtable.h
Original file line number Diff line number Diff line change
Expand Up @@ -228,7 +228,7 @@ static inline pte_t pte_wrprotect(pte_t pte)
return set_pte_bit(pte, __pgprot(L_PTE_RDONLY));
}

static inline pte_t pte_mkwrite(pte_t pte)
static inline pte_t pte_mkwrite_novma(pte_t pte)
{
return clear_pte_bit(pte, __pgprot(L_PTE_RDONLY));
}
Expand Down
2 changes: 1 addition & 1 deletion arch/arm/kernel/signal.c
Original file line number Diff line number Diff line change
Expand Up @@ -682,7 +682,7 @@ asmlinkage void do_rseq_syscall(struct pt_regs *regs)
*/
static_assert(NSIGILL == 11);
static_assert(NSIGFPE == 15);
static_assert(NSIGSEGV == 9);
static_assert(NSIGSEGV == 10);
static_assert(NSIGBUS == 5);
static_assert(NSIGTRAP == 6);
static_assert(NSIGCHLD == 6);
Expand Down
4 changes: 2 additions & 2 deletions arch/arm64/include/asm/pgtable.h
Original file line number Diff line number Diff line change
Expand Up @@ -181,7 +181,7 @@ static inline pmd_t set_pmd_bit(pmd_t pmd, pgprot_t prot)
return pmd;
}

static inline pte_t pte_mkwrite(pte_t pte)
static inline pte_t pte_mkwrite_novma(pte_t pte)
{
pte = set_pte_bit(pte, __pgprot(PTE_WRITE));
pte = clear_pte_bit(pte, __pgprot(PTE_RDONLY));
Expand Down Expand Up @@ -487,7 +487,7 @@ static inline int pmd_trans_huge(pmd_t pmd)
#define pmd_cont(pmd) pte_cont(pmd_pte(pmd))
#define pmd_wrprotect(pmd) pte_pmd(pte_wrprotect(pmd_pte(pmd)))
#define pmd_mkold(pmd) pte_pmd(pte_mkold(pmd_pte(pmd)))
#define pmd_mkwrite(pmd) pte_pmd(pte_mkwrite(pmd_pte(pmd)))
#define pmd_mkwrite_novma(pmd) pte_pmd(pte_mkwrite_novma(pmd_pte(pmd)))
#define pmd_mkclean(pmd) pte_pmd(pte_mkclean(pmd_pte(pmd)))
#define pmd_mkdirty(pmd) pte_pmd(pte_mkdirty(pmd_pte(pmd)))
#define pmd_mkyoung(pmd) pte_pmd(pte_mkyoung(pmd_pte(pmd)))
Expand Down
2 changes: 1 addition & 1 deletion arch/arm64/kernel/signal.c
Original file line number Diff line number Diff line change
Expand Up @@ -1344,7 +1344,7 @@ void __init minsigstksz_setup(void)
*/
static_assert(NSIGILL == 11);
static_assert(NSIGFPE == 15);
static_assert(NSIGSEGV == 9);
static_assert(NSIGSEGV == 10);
static_assert(NSIGBUS == 5);
static_assert(NSIGTRAP == 6);
static_assert(NSIGCHLD == 6);
Expand Down
2 changes: 1 addition & 1 deletion arch/arm64/kernel/signal32.c
Original file line number Diff line number Diff line change
Expand Up @@ -460,7 +460,7 @@ void compat_setup_restart_syscall(struct pt_regs *regs)
*/
static_assert(NSIGILL == 11);
static_assert(NSIGFPE == 15);
static_assert(NSIGSEGV == 9);
static_assert(NSIGSEGV == 10);
static_assert(NSIGBUS == 5);
static_assert(NSIGTRAP == 6);
static_assert(NSIGCHLD == 6);
Expand Down
4 changes: 2 additions & 2 deletions arch/arm64/mm/trans_pgd.c
Original file line number Diff line number Diff line change
Expand Up @@ -41,7 +41,7 @@ static void _copy_pte(pte_t *dst_ptep, pte_t *src_ptep, unsigned long addr)
* read only (code, rodata). Clear the RDONLY bit from
* the temporary mappings we use during restore.
*/
set_pte(dst_ptep, pte_mkwrite(pte));
set_pte(dst_ptep, pte_mkwrite_novma(pte));
} else if ((debug_pagealloc_enabled() ||
is_kfence_address((void *)addr)) && !pte_none(pte)) {
/*
Expand All @@ -55,7 +55,7 @@ static void _copy_pte(pte_t *dst_ptep, pte_t *src_ptep, unsigned long addr)
*/
BUG_ON(!pfn_valid(pte_pfn(pte)));

set_pte(dst_ptep, pte_mkpresent(pte_mkwrite(pte)));
set_pte(dst_ptep, pte_mkpresent(pte_mkwrite_novma(pte)));
}
}

Expand Down
2 changes: 1 addition & 1 deletion arch/csky/include/asm/pgtable.h
Original file line number Diff line number Diff line change
Expand Up @@ -176,7 +176,7 @@ static inline pte_t pte_mkold(pte_t pte)
return pte;
}

static inline pte_t pte_mkwrite(pte_t pte)
static inline pte_t pte_mkwrite_novma(pte_t pte)
{
pte_val(pte) |= _PAGE_WRITE;
if (pte_val(pte) & _PAGE_MODIFIED)
Expand Down
2 changes: 1 addition & 1 deletion arch/hexagon/include/asm/pgtable.h
Original file line number Diff line number Diff line change
Expand Up @@ -300,7 +300,7 @@ static inline pte_t pte_wrprotect(pte_t pte)
}

/* pte_mkwrite - mark page as writable */
static inline pte_t pte_mkwrite(pte_t pte)
static inline pte_t pte_mkwrite_novma(pte_t pte)
{
pte_val(pte) |= _PAGE_WRITE;
return pte;
Expand Down
2 changes: 1 addition & 1 deletion arch/ia64/include/asm/pgtable.h
Original file line number Diff line number Diff line change
Expand Up @@ -269,7 +269,7 @@ ia64_phys_addr_valid (unsigned long addr)
* access rights:
*/
#define pte_wrprotect(pte) (__pte(pte_val(pte) & ~_PAGE_AR_RW))
#define pte_mkwrite(pte) (__pte(pte_val(pte) | _PAGE_AR_RW))
#define pte_mkwrite_novma(pte) (__pte(pte_val(pte) | _PAGE_AR_RW))
#define pte_mkold(pte) (__pte(pte_val(pte) & ~_PAGE_A))
#define pte_mkyoung(pte) (__pte(pte_val(pte) | _PAGE_A))
#define pte_mkclean(pte) (__pte(pte_val(pte) & ~_PAGE_D))
Expand Down
4 changes: 2 additions & 2 deletions arch/loongarch/include/asm/pgtable.h
Original file line number Diff line number Diff line change
Expand Up @@ -384,7 +384,7 @@ static inline pte_t pte_mkdirty(pte_t pte)
return pte;
}

static inline pte_t pte_mkwrite(pte_t pte)
static inline pte_t pte_mkwrite_novma(pte_t pte)
{
pte_val(pte) |= _PAGE_WRITE;
if (pte_val(pte) & _PAGE_MODIFIED)
Expand Down Expand Up @@ -493,7 +493,7 @@ static inline int pmd_write(pmd_t pmd)
return !!(pmd_val(pmd) & _PAGE_WRITE);
}

static inline pmd_t pmd_mkwrite(pmd_t pmd)
static inline pmd_t pmd_mkwrite_novma(pmd_t pmd)
{
pmd_val(pmd) |= _PAGE_WRITE;
if (pmd_val(pmd) & _PAGE_MODIFIED)
Expand Down
2 changes: 1 addition & 1 deletion arch/m68k/include/asm/mcf_pgtable.h
Original file line number Diff line number Diff line change
Expand Up @@ -210,7 +210,7 @@ static inline pte_t pte_mkold(pte_t pte)
return pte;
}

static inline pte_t pte_mkwrite(pte_t pte)
static inline pte_t pte_mkwrite_novma(pte_t pte)
{
pte_val(pte) |= CF_PAGE_WRITABLE;
return pte;
Expand Down
Loading

0 comments on commit df57721

Please sign in to comment.