Skip to content

Commit

Permalink
Clear AVX512 feature bits when AVX512 not actually supported
Browse files Browse the repository at this point in the history
According to Intel's documentation, if not all the AVX512 bits in XCR0
are set (meaning that the operating system doesn't fully support
AVX512), then no AVX512 feature can be used, even on xmm and ymm
registers.  Make OPENSSL_cpuid_setup() correctly handle this case by
clearing all the AVX512 feature bits when this situation is detected.

Change-Id: I2774dbc28bfbac1196e405c0920ba2909e7f0eb3
Reviewed-on: https://boringssl-review.googlesource.com/c/boringssl/+/68907
Reviewed-by: David Benjamin <davidben@google.com>
Reviewed-by: Bob Beck <bbe@google.com>
Commit-Queue: Adam Langley <agl@google.com>
Auto-Submit: Eric Biggers <ebiggers@google.com>
  • Loading branch information
ebiggers authored and Boringssl LUCI CQ committed Aug 14, 2024
1 parent 08a232f commit c98b28b
Show file tree
Hide file tree
Showing 2 changed files with 36 additions and 16 deletions.
42 changes: 31 additions & 11 deletions crypto/cpu_intel.c
Original file line number Diff line number Diff line change
Expand Up @@ -227,19 +227,39 @@ void OPENSSL_cpuid_setup(void) {
ecx &= ~(1u << 28); // AVX
ecx &= ~(1u << 12); // FMA
ecx &= ~(1u << 11); // AMD XOP
// Clear AVX2 and AVX512* bits.
//
// TODO(davidben): Should bits 17 and 26-28 also be cleared? Upstream
// doesn't clear those. See the comments in
// |CRYPTO_hardware_supports_XSAVE|.
extended_features[0] &=
~((1u << 5) | (1u << 16) | (1u << 21) | (1u << 30) | (1u << 31));
extended_features[0] &= ~(1u << 5); // AVX2
}
// See Intel manual, volume 1, section 15.2.
// See Intel manual, volume 1, sections 15.2 ("Detection of AVX-512 Foundation
// Instructions") through 15.4 ("Detection of Intel AVX-512 Instruction Groups
// Operating at 256 and 128-bit Vector Lengths").
if ((xcr0 & 0xe6) != 0xe6) {
// Clear AVX512F. Note we don't touch other AVX512 extensions because they
// can be used with YMM.
extended_features[0] &= ~(1u << 16);
// Without XCR0.111xx11x, no AVX512 feature can be used. This includes ZMM
// registers, masking, SIMD registers 16-31 (even if accessed as YMM or
// XMM), and EVEX-coded instructions (even on YMM or XMM). Even if only
// XCR0.ZMM_Hi256 is missing, it isn't valid to use AVX512 features on
// shorter vectors, since AVX512 ties everything to the availability of
// 512-bit vectors. See the above-mentioned sections of the Intel manual,
// which say that *all* these XCR0 bits must be checked even when just using
// 128-bit or 256-bit vectors, and also volume 2a section 2.7.11 ("#UD
// Equations for EVEX") which says that all EVEX-coded instructions raise an
// undefined-instruction exception if any of these XCR0 bits is zero.
//
// AVX10 fixes this by reorganizing the features that used to be part of
// "AVX512" and allowing them to be used independently of 512-bit support.
// TODO: add AVX10 detection.
extended_features[0] &= ~(1u << 16); // AVX512F
extended_features[0] &= ~(1u << 17); // AVX512DQ
extended_features[0] &= ~(1u << 21); // AVX512IFMA
extended_features[0] &= ~(1u << 26); // AVX512PF
extended_features[0] &= ~(1u << 27); // AVX512ER
extended_features[0] &= ~(1u << 28); // AVX512CD
extended_features[0] &= ~(1u << 30); // AVX512BW
extended_features[0] &= ~(1u << 31); // AVX512VL
extended_features[1] &= ~(1u << 1); // AVX512VBMI
extended_features[1] &= ~(1u << 6); // AVX512VBMI2
extended_features[1] &= ~(1u << 11); // AVX512VNNI
extended_features[1] &= ~(1u << 12); // AVX512BITALG
extended_features[1] &= ~(1u << 14); // AVX512VPOPCNTDQ
}

OPENSSL_ia32cap_P[0] = edx;
Expand Down
10 changes: 5 additions & 5 deletions crypto/internal.h
Original file line number Diff line number Diff line change
Expand Up @@ -1390,13 +1390,13 @@ OPENSSL_INLINE int boringssl_fips_break_test(const char *test) {
// ECX for CPUID where EAX = 1
// Bit 11 is used to indicate AMD XOP support, not SDBG
// Index 2:
// EBX for CPUID where EAX = 7
// EBX for CPUID where EAX = 7, ECX = 0
// Index 3:
// ECX for CPUID where EAX = 7
// ECX for CPUID where EAX = 7, ECX = 0
//
// Note: the CPUID bits are pre-adjusted for the OSXSAVE bit and the YMM and XMM
// bits in XCR0, so it is not necessary to check those. (WARNING: See caveats
// in cpu_intel.c.)
// Note: the CPUID bits are pre-adjusted for the OSXSAVE bit and the XMM, YMM,
// and AVX512 bits in XCR0, so it is not necessary to check those. (WARNING: See
// caveats in cpu_intel.c.)
//
// From C, this symbol should only be accessed with |OPENSSL_get_ia32cap|.
extern uint32_t OPENSSL_ia32cap_P[4];
Expand Down

0 comments on commit c98b28b

Please sign in to comment.