Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[AArch64][FMV] Incorrect SVE detection on SVE2 cores with FMV #93651

Closed
kinoshita-fj opened this issue May 29, 2024 · 7 comments
Closed

[AArch64][FMV] Incorrect SVE detection on SVE2 cores with FMV #93651

kinoshita-fj opened this issue May 29, 2024 · 7 comments
Assignees
Labels

Comments

@kinoshita-fj
Copy link

Problem

Compiling the following program using FMV with -rtlib=compiler-rt and running it on cores supporting SVE2 results in the execution of the default function.
This indicates that the target_version("SVE") attribute is not being correctly detected on SVE2 cores, because SVE2 contains SVE. However, the detection appears to be accurate on SVE-only cores.

#include <stdio.h>
#include <stdlib.h>


__attribute__ ((target_version("sve")))
void func1(void){
        printf("SVE\n");
}

__attribute__ ((target_version("default")))
void func1(void){
        printf("not SVE\n");
}

int main(int argc, char **argv){
        func1();
        return 0;
}

Expected behavior

On SVE2 cores and SVE cores:

The func1 with target_version("sve") should be executed. And it should print SVE.

Actual behavior

On SVE2 cores:

The func1 with target_version("default") is executed instead. And it printed not SVE.

On SVE cores:

The func1 with target_version("sve") is executed correctly. And it printed SVE.

Environment

At least Clang 18.1.2 and the latest main branch (commit b80d982).

Cause

The most likely cause is that the conditions to determine SVE support are incorrect.

They are used at __init_cpu_features_constructor function which is generated when FMV features.

llvm-project/compiler-rt/lib/builtins/cpu_model/aarch64/fmv/mrs.inc

    // ID_AA64PFR0_EL1.SVE != 0b0000
    if (extractBits(ftr, 32, 4) != 0x0) {
      // get ID_AA64ZFR0_EL1, that name supported
      // if sve enabled only
      getCPUFeature(S3_0_C0_C4_4, ftr);
      // ID_AA64ZFR0_EL1.SVEver == 0b0000
      if (extractBits(ftr, 0, 4) == 0x0)
        setCPUFeature(FEAT_SVE);
      // ID_AA64ZFR0_EL1.SVEver == 0b0001
      if (extractBits(ftr, 0, 4) == 0x1)
        setCPUFeature(FEAT_SVE2);
      // ID_AA64ZFR0_EL1.BF16 != 0b0000
      if (extractBits(ftr, 20, 4) != 0x0)
        setCPUFeature(FEAT_SVE_BF16);
    }

Possible Fix

Cores that support SVE2 also support SVE. The code should be modified as follows.

    // ID_AA64PFR0_EL1.SVE != 0b0000
    if (extractBits(ftr, 32, 4) != 0x0) {
      // get ID_AA64ZFR0_EL1, that name supported
      // if sve enabled only
      getCPUFeature(S3_0_C0_C4_4, ftr);
-     // ID_AA64ZFR0_EL1.SVEver == 0b0000
-     if (extractBits(ftr, 0, 4) == 0x0)
        setCPUFeature(FEAT_SVE);
      // ID_AA64ZFR0_EL1.SVEver == 0b0001
      if (extractBits(ftr, 0, 4) == 0x1)
        setCPUFeature(FEAT_SVE2);
      // ID_AA64ZFR0_EL1.BF16 != 0b0000
      if (extractBits(ftr, 20, 4) != 0x0)
        setCPUFeature(FEAT_SVE_BF16);
    }

A similar issue is posted in the ACLE, but it is not yet fixed.

@kawashima-fj kawashima-fj added clang Clang issues not falling into any other category compiler-rt:builtins SVE ARM Scalable Vector Extensions and removed new issue labels May 29, 2024
@EugeneZelenko EugeneZelenko added backend:AArch64 and removed clang Clang issues not falling into any other category labels May 29, 2024
@llvmbot
Copy link
Collaborator

llvmbot commented May 29, 2024

@llvm/issue-subscribers-backend-aarch64

Author: Kinoshita Kotaro (kinoshita-fj)

# Problem

Compiling the following program using FMV with -rtlib=compiler-rt and running it on cores supporting SVE2 results in the execution of the default function.
This indicates that the target_version("SVE") attribute is not being correctly detected on SVE2 cores, because SVE2 contains SVE. However, the detection appears to be accurate on SVE-only cores.

#include &lt;stdio.h&gt;
#include &lt;stdlib.h&gt;


__attribute__ ((target_version("sve")))
void func1(void){
        printf("SVE\n");
}

__attribute__ ((target_version("default")))
void func1(void){
        printf("not SVE\n");
}

int main(int argc, char **argv){
        func1();
        return 0;
}

Expected behavior

On SVE2 cores and SVE cores:

The func1 with target_version("sve") should be executed. And it should print SVE.

Actual behavior

On SVE2 cores:

The func1 with target_version("default") is executed instead. And it printed not SVE.

On SVE cores:

The func1 with target_version("sve") is executed correctly. And it printed SVE.

Environment

At least Clang 18.1.2 and the latest main branch (commit b80d982).

Cause

The most likely cause is that the conditions to determine SVE support are incorrect.

They are used at __init_cpu_features_constructor function which is generated when FMV features.

llvm-project/compiler-rt/lib/builtins/cpu_model/aarch64/fmv/mrs.inc

    // ID_AA64PFR0_EL1.SVE != 0b0000
    if (extractBits(ftr, 32, 4) != 0x0) {
      // get ID_AA64ZFR0_EL1, that name supported
      // if sve enabled only
      getCPUFeature(S3_0_C0_C4_4, ftr);
      // ID_AA64ZFR0_EL1.SVEver == 0b0000
      if (extractBits(ftr, 0, 4) == 0x0)
        setCPUFeature(FEAT_SVE);
      // ID_AA64ZFR0_EL1.SVEver == 0b0001
      if (extractBits(ftr, 0, 4) == 0x1)
        setCPUFeature(FEAT_SVE2);
      // ID_AA64ZFR0_EL1.BF16 != 0b0000
      if (extractBits(ftr, 20, 4) != 0x0)
        setCPUFeature(FEAT_SVE_BF16);
    }

Possible Fix

Cores that support SVE2 also support SVE. The code should be modified as follows.

    // ID_AA64PFR0_EL1.SVE != 0b0000
    if (extractBits(ftr, 32, 4) != 0x0) {
      // get ID_AA64ZFR0_EL1, that name supported
      // if sve enabled only
      getCPUFeature(S3_0_C0_C4_4, ftr);
-     // ID_AA64ZFR0_EL1.SVEver == 0b0000
-     if (extractBits(ftr, 0, 4) == 0x0)
        setCPUFeature(FEAT_SVE);
      // ID_AA64ZFR0_EL1.SVEver == 0b0001
      if (extractBits(ftr, 0, 4) == 0x1)
        setCPUFeature(FEAT_SVE2);
      // ID_AA64ZFR0_EL1.BF16 != 0b0000
      if (extractBits(ftr, 20, 4) != 0x0)
        setCPUFeature(FEAT_SVE_BF16);
    }

A similar issue is posted in the ACLE, but it is not yet fixed.

@Wilco1
Copy link

Wilco1 commented May 31, 2024

Agreed - one must never use equality when comparing CPUID registers since newer versions imply older versions, so the correct sequence would be like:

      if (extractBits(ftr, 0, 4) >= 0x0)
        setCPUFeature(FEAT_SVE);
      // ID_AA64ZFR0_EL1.SVEver == 0b0001
      if (extractBits(ftr, 0, 4) >= 0x1)
        setCPUFeature(FEAT_SVE2);

However this code should simply use the HWCAPS rather than trying to interpret CPUID registers incorrectly.

@ilinpv
Copy link
Contributor

ilinpv commented May 31, 2024

Thank you for report, HWCAP detection is used in __init_cpu_features_constructor when available and corresponds to FMV feature. Seems we need to amend ACLE FMV specification first and then fix detection in runtime libraries.

@kinoshita-fj
Copy link
Author

@ilinpv Thank you for your comment. I will wait for amending ACLE specification and runtime libraries.

@ilinpv
Copy link
Contributor

ilinpv commented Jun 5, 2024

ACLE specification amendment on review ARM-software/acle#322

@labrinea labrinea self-assigned this Jun 5, 2024
@Wilco1
Copy link

Wilco1 commented Jun 5, 2024

See also https://gcc.gnu.org/bugzilla/show_bug.cgi?id=115342 - this has now been fixed in GCC for all currently specified features.

Additionally the initialization must be done atomically to avoid race conditions.

labrinea added a commit to labrinea/llvm-project that referenced this issue Jun 11, 2024
To detect features we either use HWCAPs or directly extract system register
bitfields and compare with a value. In many cases equality comparisons give
wrong results for example FEAT_SVE is not set if SVE2 is available (see the
issue llvm#93651). I am also making the access to __aarch64_cpu_features atomic.

The corresponding PR for the ACLE specification is
ARM-software/acle#322.
labrinea added a commit that referenced this issue Jun 13, 2024
To detect features we either use HWCAPs or directly extract system
register bitfields and compare with a value. In many cases equality
comparisons give wrong results for example FEAT_SVE is not set if SVE2
is available (see the issue #93651). I am also making the access to
__aarch64_cpu_features atomic.

The corresponding PR for the ACLE specification is
ARM-software/acle#322.
@kinoshita-fj
Copy link
Author

@labrinea I confirmed that SVE detection was fixed. Thank you.

EthanLuisMcDonough pushed a commit to EthanLuisMcDonough/llvm-project that referenced this issue Aug 13, 2024
To detect features we either use HWCAPs or directly extract system
register bitfields and compare with a value. In many cases equality
comparisons give wrong results for example FEAT_SVE is not set if SVE2
is available (see the issue llvm#93651). I am also making the access to
__aarch64_cpu_features atomic.

The corresponding PR for the ACLE specification is
ARM-software/acle#322.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

7 participants