Skip to content

Commit

Permalink
Some minor cleanup post the addition of TYP_SIMD64 and ZMM support - …
Browse files Browse the repository at this point in the history
…P1 (dotnet#83044)

* Ensure EA_16BYTE is FEATURE_SIMD only and EA_32/64BYTE are TARGET_XARCH only

* Remove getSIMDSupportLevel as its now unnecessary

* Ensure canUseVexEncoding and canUseEvexEncoding are xarch only

* Don't make EA_16BYTE+ require FEATURE_SIMD

* Resolving formatting and build failures

* Adding back a check that shouldn't have been removed
  • Loading branch information
tannergooding committed Mar 7, 2023
1 parent 6512040 commit 4699358
Show file tree
Hide file tree
Showing 10 changed files with 72 additions and 130 deletions.
11 changes: 5 additions & 6 deletions docs/design/coreclr/botr/vectors-and-intrinsics.md
Original file line number Diff line number Diff line change
Expand Up @@ -15,15 +15,15 @@ Most hardware intrinsics support is tied to the use of various Vector apis. Ther

- The fixed length float vectors. `Vector2`, `Vector3`, and `Vector4`. These vector types represent a struct of floats of various lengths. For type layout, ABI and, interop purposes they are represented in exactly the same way as a structure with an appropriate number of floats in it. Operations on these vector types are supported on all architectures and platforms, although some architectures may optimize various operations.
- The variable length `Vector<T>`. This represents vector data of runtime-determined length. In any given process the length of a `Vector<T>` is the same in all methods, but this length may differ between various machines or environment variable settings read at startup of the process. The `T` type variable may be the following types (`System.Byte`, `System.SByte`, `System.Int16`, `System.UInt16`, `System.Int32`, `System.UInt32`, `System.Int64`, `System.UInt64`, `System.Single`, and `System.Double`), and allows use of integer or double data within a vector. The length and alignment of `Vector<T>` is unknown to the developer at compile time (although discoverable at runtime by using the `Vector<T>.Count` api), and `Vector<T>` may not exist in any interop signature. Operations on these vector types are supported on all architectures and platforms, although some architectures may optimize various operations if the `Vector<T>.IsHardwareAccelerated` api returns true.
- `Vector64<T>`, `Vector128<T>`, and `Vector256<T>` represent fixed-sized vectors that closely resemble the fixed- sized vectors available in C++. These structures can be used in any code that runs, but very few features are supported directly on these types other than creation. They are used primarily in the processor specific hardware intrinsics apis.
- `Vector64<T>`, `Vector128<T>`, `Vector256<T>`, and `Vector512<T>` represent fixed-sized vectors that closely resemble the fixed- sized vectors available in C++. These structures can be used in any code that runs, but very few features are supported directly on these types other than creation. They are used primarily in the processor specific hardware intrinsics apis.
- Processor specific hardware intrinsics apis such as `System.Runtime.Intrinsics.X86.Ssse3`. These apis map directly to individual instructions or short instruction sequences that are specific to a particular hardware instruction. These apis are only usable on hardware that supports the particular instruction. See https://github.com/dotnet/designs/blob/master/accepted/2018/platform-intrinsics.md for the design of these.

# How to use intrinsics apis

There are 3 models for use of intrinsics apis.

1. Usage of `Vector2`, `Vector3`, `Vector4`, and `Vector<T>`. For these, its always safe to just use the types. The jit will generate code that is as optimal as it can for the logic, and will do so unconditionally.
2. Usage of `Vector64<T>`, `Vector128<T>`, and `Vector256<T>`. These types may be used unconditionally, but are only truly useful when also using the platform specific hardware intrinsics apis.
2. Usage of `Vector64<T>`, `Vector128<T>`, `Vector256<T>`, and `Vector512<T>`. These types may be used unconditionally, but are only truly useful when also using the platform specific hardware intrinsics apis.
3. Usage of platform intrinsics apis. All usage of these apis should be wrapped in an `IsSupported` check of the appropriate kind. Then, within the `IsSupported` check the platform specific api may be used. If multiple instruction sets are used, then the application developer must have checks for the instruction sets as used on each one of them.

# Effect of usage of hardware intrinsics on how code is generated
Expand Down Expand Up @@ -142,7 +142,7 @@ public class BitOperations
#### Crossgen implementation rules
- Any code which uses an intrinsic from the `System.Runtime.Intrinsics.Arm` or `System.Runtime.Intrinsics.X86` namespace will not be compiled AOT. (See code which throws a TypeLoadException using `IDS_EE_HWINTRINSIC_NGEN_DISALLOWED`)
- Any code which uses `Vector<T>` will not be compiled AOT. (See code which throws a TypeLoadException using `IDS_EE_SIMD_NGEN_DISALLOWED`)
- Any code which uses `Vector64<T>`, `Vector128<T>` or `Vector256<T>` will not be compiled AOT. (See code which throws a TypeLoadException using `IDS_EE_HWINTRINSIC_NGEN_DISALLOWED`)
- Any code which uses `Vector64<T>`, `Vector128<T>`, `Vector256<T>`, or `Vector512<T>` will not be compiled AOT. (See code which throws a TypeLoadException using `IDS_EE_HWINTRINSIC_NGEN_DISALLOWED`)
- Non-platform intrinsics which require more hardware support than the minimum supported hardware capability will not take advantage of that capability. In particular the code generated for Vector2/3/4 is sub-optimal. MethodImplOptions.AggressiveOptimization may be used to disable compilation of this sub-par code.

#### Characteristics which result from rules
Expand All @@ -160,10 +160,10 @@ There are 2 sets of instruction sets known to the compiler.
- The baseline instruction set which defaults to (Sse, Sse2), but may be adjusted via compiler option.
- The optimistic instruction set which defaults to (Sse3, Ssse3, Sse41, Sse42, Popcnt, Pclmulqdq, and Lzcnt).

Code will be compiled using the optimistic instruction set to drive compilation, but any use of an instruction set beyond the baseline instruction set will be recorded, as will any attempt to use an instruction set beyond the optimistic set if that attempted use has a semantic effect. If the baseline instruction set includes `Avx2` then the size and characteristics of of `Vector<T>` is known. Any other decisions about ABI may also be encoded. For instance, it is likely that the ABI of `Vector256<T>` will vary based on the presence/absence of `Avx` support.
Code will be compiled using the optimistic instruction set to drive compilation, but any use of an instruction set beyond the baseline instruction set will be recorded, as will any attempt to use an instruction set beyond the optimistic set if that attempted use has a semantic effect. If the baseline instruction set includes `Avx2` then the size and characteristics of of `Vector<T>` is known. Any other decisions about ABI may also be encoded. For instance, it is likely that the ABI of `Vector256<T>` and `Vector512<T>` will vary based on the presence/absence of `Avx` support.

- Any code which uses `Vector<T>` will not be compiled AOT unless the size of `Vector<T>` is known.
- Any code which passes a `Vector256<T>` as a parameter on a Linux or Mac machine will not be compiled AOT unless the support for the `Avx` instruction set is known.
- Any code which passes a `Vector256<T>` or `Vector512<T>` as a parameter on a Linux or Mac machine will not be compiled AOT unless the support for the `Avx` instruction set is known.
- Non-platform intrinsics which require more hardware support than the optimistic supported hardware capability will not take advantage of that capability. MethodImplOptions.AggressiveOptimization may be used to disable compilation of this sub-par code.
- Code which takes advantage of instructions sets in the optimistic set will not be used on a machine which only supports the baseline instruction set.
- Code which attempts to use instruction sets outside of the optimistic set will generate code that will not be used on machines with support for the instruction set.
Expand Down Expand Up @@ -194,7 +194,6 @@ While the above api exists, it is not expected that general purpose code within
|`compExactlyDependsOn(isa)`| Use when making a decision to use or not use an instruction set when the decision will affect the semantics of the generated code. Should never be used in an assert. | Return whether or not an instruction set is supported. Calls notifyInstructionSetUsage with the result of that computation.
|`compOpportunisticallyDependsOn(isa)`| Use when making an opportunistic decision to use or not use an instruction set. Use when the instruction set usage is a "nice to have optimization opportunity", but do not use when a false result may change the semantics of the program. Should never be used in an assert. | Return whether or not an instruction set is supported. Calls notifyInstructionSetUsage if the instruction set is supported.
|`compIsaSupportedDebugOnly(isa)` | Use to assert whether or not an instruction set is supported | Return whether or not an instruction set is supported. Does not report anything. Only available in debug builds.
|`getSIMDSupportLevel()`| Use when determining what codegen to generate for code that operates on `Vector<T>`, `Vector2`, `Vector3` or `Vector4`.| Queries the instruction sets supported using `compOpportunisticallyDependsOn`, and finds a set of instructions available to use for working with the platform agnostic vector types.
|`getSIMDVectorType()`| Use to get the TYP of a the `Vector<T>` type. | Determine the TYP of the `Vector<T>` type. If on the architecture the TYP may vary depending on whatever rules, this function will make sufficient use of the `notifyInstructionSetUsage` api to ensure that the TYP is consistent between compile time and runtime.
|`getSIMDVectorRegisterByteLength()` | Use to get the size of a `Vector<T>` value. | Determine the size of the `Vector<T>` type. If on the architecture the size may vary depending on whatever rules, this function will make sufficient use of the `notifyInstructionSetUsage` api to ensure that the size is consistent between compile time and runtime.
|`maxSIMDStructBytes()`| Get the maximum number of bytes that might be used in a SIMD type during this compilation. | Query the set of instruction sets supported, and determine the largest simd type supported. Use `compOpportunisticallyDependsOn` to perform the queries so that the maximum size needed is the only one recorded.
Expand Down
20 changes: 15 additions & 5 deletions src/coreclr/jit/codegencommon.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -1731,6 +1731,7 @@ void CodeGen::genGenerateMachineCode()

printf(" for ");

#if defined(TARGET_X86)
if (compiler->info.genCPU == CPU_X86)
{
printf("generic X86 CPU");
Expand All @@ -1739,9 +1740,14 @@ void CodeGen::genGenerateMachineCode()
{
printf("Pentium 4");
}
else if (compiler->info.genCPU == CPU_X64)
#elif defined(TARGET_AMD64)
if (compiler->info.genCPU == CPU_X64)
{
if (compiler->canUseVexEncoding())
if (compiler->canUseEvexEncoding())
{
printf("X64 CPU with AVX512");
}
else if (compiler->canUseVexEncoding())
{
printf("X64 CPU with AVX");
}
Expand All @@ -1750,18 +1756,22 @@ void CodeGen::genGenerateMachineCode()
printf("X64 CPU with SSE2");
}
}
else if (compiler->info.genCPU == CPU_ARM)
#elif defined(TARGET_ARM)
if (compiler->info.genCPU == CPU_ARM)
{
printf("generic ARM CPU");
}
else if (compiler->info.genCPU == CPU_ARM64)
#elif defined(TARGET_ARM64)
if (compiler->info.genCPU == CPU_ARM64)
{
printf("generic ARM64 CPU");
}
else if (compiler->info.genCPU == CPU_LOONGARCH64)
#elif defined(TARGET_LOONGARCH64)
if (compiler->info.genCPU == CPU_LOONGARCH64)
{
printf("generic LOONGARCH64 CPU");
}
#endif
else
{
printf("unknown architecture");
Expand Down
1 change: 0 additions & 1 deletion src/coreclr/jit/codegenxarch.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -10729,7 +10729,6 @@ void CodeGen::genZeroInitFrameUsingBlockInit(int untrLclHi, int untrLclLo, regNu
assert(compiler->compGeneratingProlog);
assert(genUseBlockInit);
assert(untrLclHi > untrLclLo);
assert(compiler->getSIMDSupportLevel() >= SIMD_SSE2_Supported);

emitter* emit = GetEmitter();
regNumber frameReg = genFramePointerReg();
Expand Down
5 changes: 1 addition & 4 deletions src/coreclr/jit/compiler.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -2238,11 +2238,8 @@ void Compiler::compSetProcessor()
info.genCPU = CPU_X86_PENTIUM_4;
else
info.genCPU = CPU_X86;

#elif defined(TARGET_LOONGARCH64)

info.genCPU = CPU_LOONGARCH64;

info.genCPU = CPU_LOONGARCH64;
#endif

//
Expand Down
79 changes: 24 additions & 55 deletions src/coreclr/jit/compiler.h
Original file line number Diff line number Diff line change
Expand Up @@ -8373,34 +8373,6 @@ XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX
}
#endif // DEBUG

// Get highest available level for SIMD codegen
SIMDLevel getSIMDSupportLevel()
{
#if defined(TARGET_XARCH)
if (compOpportunisticallyDependsOn(InstructionSet_AVX2))
{
if (compOpportunisticallyDependsOn(InstructionSet_Vector512))
{
return SIMD_Vector512_Supported;
}

return SIMD_AVX2_Supported;
}

if (compOpportunisticallyDependsOn(InstructionSet_SSE42))
{
return SIMD_SSE4_Supported;
}

// min bar is SSE2
return SIMD_SSE2_Supported;
#else
assert(!"Available instruction set(s) for SIMD codegen is not defined for target arch");
unreached();
return SIMD_Not_Supported;
#endif
}

bool isIntrinsicType(CORINFO_CLASS_HANDLE clsHnd)
{
return info.compCompHnd->isIntrinsicType(clsHnd);
Expand Down Expand Up @@ -8831,16 +8803,14 @@ XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX
var_types getSIMDVectorType()
{
#if defined(TARGET_XARCH)
// TODO-XArch-AVX512 : Return TYP_SIMD64 once Vector<T> supports AVX512.
if (getSIMDSupportLevel() >= SIMD_AVX2_Supported)
if (compOpportunisticallyDependsOn(InstructionSet_AVX2))
{
// TODO-XArch-AVX512 : Return TYP_SIMD64 once Vector<T> supports AVX512.
return TYP_SIMD32;
}
else
{
// Verify and record that AVX2 isn't supported
compVerifyInstructionSetUnusable(InstructionSet_AVX2);
assert(getSIMDSupportLevel() >= SIMD_SSE2_Supported);
return TYP_SIMD16;
}
#elif defined(TARGET_ARM64)
Expand Down Expand Up @@ -8873,16 +8843,14 @@ XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX
unsigned getSIMDVectorRegisterByteLength()
{
#if defined(TARGET_XARCH)
// TODO-XArch-AVX512 : Return ZMM_REGSIZE_BYTES once Vector<T> supports AVX512.
if (getSIMDSupportLevel() >= SIMD_AVX2_Supported)
if (compOpportunisticallyDependsOn(InstructionSet_AVX2))
{
// TODO-XArch-AVX512 : Return ZMM_REGSIZE_BYTES once Vector<T> supports AVX512.
return YMM_REGSIZE_BYTES;
}
else
{
// Verify and record that AVX2 isn't supported
compVerifyInstructionSetUnusable(InstructionSet_AVX2);
assert(getSIMDSupportLevel() >= SIMD_SSE2_Supported);
return XMM_REGSIZE_BYTES;
}
#elif defined(TARGET_ARM64)
Expand All @@ -8897,9 +8865,11 @@ XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX

// maxSIMDStructBytes
// The minimum SIMD size supported by System.Numeric.Vectors or System.Runtime.Intrinsic
// SSE: 16-byte Vector<T> and Vector128<T>
// AVX: 32-byte Vector256<T> (Vector<T> is 16-byte)
// AVX2: 32-byte Vector<T> and Vector256<T>
// Arm.AdvSimd: 16-byte Vector<T> and Vector128<T>
// X86.SSE: 16-byte Vector<T> and Vector128<T>
// X86.AVX: 16-byte Vector<T> and Vector256<T>
// X86.AVX2: 32-byte Vector<T> and Vector256<T>
// X86.AVX512F: 32-byte Vector<T> and Vector512<T>
unsigned int maxSIMDStructBytes()
{
#if defined(FEATURE_HW_INTRINSICS) && defined(TARGET_XARCH)
Expand All @@ -8909,17 +8879,22 @@ XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX
{
return ZMM_REGSIZE_BYTES;
}
return YMM_REGSIZE_BYTES;
else
{
compVerifyInstructionSetUnusable(InstructionSet_AVX512F);
return YMM_REGSIZE_BYTES;
}
}
else
{
// Verify and record that AVX2 isn't supported
compVerifyInstructionSetUnusable(InstructionSet_AVX2);
assert(getSIMDSupportLevel() >= SIMD_SSE2_Supported);
compVerifyInstructionSetUnusable(InstructionSet_AVX);
return XMM_REGSIZE_BYTES;
}
#elif defined(TARGET_ARM64)
return FP_REGSIZE_BYTES;
#else
return getSIMDVectorRegisterByteLength();
assert(!"maxSIMDStructBytes() unimplemented on target arch");
unreached();
#endif
}

Expand Down Expand Up @@ -9184,13 +9159,10 @@ XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX
#endif
}

#ifdef TARGET_XARCH
bool canUseVexEncoding() const
{
#ifdef TARGET_XARCH
return compOpportunisticallyDependsOn(InstructionSet_AVX);
#else
return false;
#endif
}

//------------------------------------------------------------------------
Expand All @@ -9201,8 +9173,6 @@ XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX
//
bool canUseEvexEncoding() const
{
#ifdef TARGET_XARCH

#ifdef DEBUG
if (JitConfig.JitForceEVEXEncoding())
{
Expand All @@ -9211,9 +9181,6 @@ XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX
#endif // DEBUG

return compOpportunisticallyDependsOn(InstructionSet_AVX512F);
#else
return false;
#endif
}

//------------------------------------------------------------------------
Expand All @@ -9224,7 +9191,7 @@ XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX
//
bool DoJitStressEvexEncoding() const
{
#if defined(TARGET_XARCH) && defined(DEBUG)
#ifdef DEBUG
// Using JitStressEVEXEncoding flag will force instructions which would
// otherwise use VEX encoding but can be EVEX encoded to use EVEX encoding
// This requires AVX512VL support. JitForceEVEXEncoding forces this encoding, thus
Expand All @@ -9234,14 +9201,16 @@ XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX
{
return true;
}

if (JitConfig.JitStressEvexEncoding() && compOpportunisticallyDependsOn(InstructionSet_AVX512F_VL))
{
return true;
}
#endif // TARGET_XARCH && DEBUG
#endif // DEBUG

return false;
}
#endif // TARGET_XARCH

/*
XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX
Expand Down
13 changes: 6 additions & 7 deletions src/coreclr/jit/emit.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -2609,17 +2609,16 @@ void emitter::emitSetFrameRangeArgs(int offsLo, int offsHi)

/*****************************************************************************
*
* A conversion table used to map an operand size value (in bytes) into its
* small encoding (0 through 3), and vice versa.
* A conversion table used to map an operand size value (in bytes) into its emitAttr
*/

const emitter::opSize emitter::emitSizeEncode[] = {
emitter::OPSZ1, emitter::OPSZ2, emitter::OPSZ4, emitter::OPSZ8, emitter::OPSZ16, emitter::OPSZ32, emitter::OPSZ64,
const emitAttr emitter::emitSizeDecode[emitter::OPSZ_COUNT] = {
EA_1BYTE, EA_2BYTE, EA_4BYTE, EA_8BYTE, EA_16BYTE,
#if defined(TARGET_XARCH)
EA_32BYTE, EA_64BYTE,
#endif // TARGET_XARCH
};

const emitAttr emitter::emitSizeDecode[emitter::OPSZ_COUNT] = {EA_1BYTE, EA_2BYTE, EA_4BYTE, EA_8BYTE,
EA_16BYTE, EA_32BYTE, EA_64BYTE};

/*****************************************************************************
*
* Allocate an instruction descriptor for an instruction that uses both
Expand Down
Loading

0 comments on commit 4699358

Please sign in to comment.