Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

BUG: CPU Feature Checks Fail on OpenBSD 7.5 #26785

Closed
AngeloD2022 opened this issue Jun 22, 2024 · 6 comments · Fixed by #26797
Closed

BUG: CPU Feature Checks Fail on OpenBSD 7.5 #26785

AngeloD2022 opened this issue Jun 22, 2024 · 6 comments · Fixed by #26797
Assignees
Labels
00 - Bug 32 - Installation Problems installing or compiling NumPy component: SIMD Issues in SIMD (fast instruction sets) code or machinery

Comments

@AngeloD2022
Copy link

AngeloD2022 commented Jun 22, 2024

Describe the issue:

My machine is running on an Intel Celeron 3855U, which according to Intel's website, does not support advanced vector extensions.

When I run pip install numpy, it needs to build from source. I assume that this is because of my relatively uncommon configuration. Nevertheless, numpy's build script appears to misrecognize my CPU's capabilities and incorrectly enables the AVX targets. Thinking I could resolve this by building numpy from source, I cloned the repository and ran the suggested command from numpy's official documentation, to manually disable the AVX features. The build appears to fail for the same reason (the related log is attached).

install_log.txt

Reproduce the code example:

CPU: Intel Celeron 3855U
OS: OpenBSD 7.5 (amd64)
Python: 3.11.9
NumPy: 2.0.0
python3.11 -m venv venv
source venv/bin/activate
pip install numpy

Error message:

Log is too big to display here, see install_log.txt

Python and NumPy Versions:

Python: 3.11.9
NumPy: 2.0.0

Runtime Environment:

N/A

Context for the issue:

This prevents usage of numpy, as well as any dependent package on machines that:

  • Run OpenBSD 7.5 (but potentially other versions as well)
  • Use an x86 CPU that does not support advanced vector extensions.
@rgommers
Copy link
Member

Thanks for the report @AngeloD2022. It seems like something is going wrong with the feature check for AVX. The build log you provided doesn't contain enough detail to see exactly what is going wrong, but the checks do return true for some reason:

  Test features "SSE SSE2 SSE3 SSSE3 SSE41 POPCNT SSE42 AVX XOP FMA4 F16C FMA3 AVX2 AVX512F AVX512CD AVX512_KNL AVX512_KNM AVX512_SKX AVX512_CLX AVX512_CNL AVX512_ICL AVX512_SPR" : Parial support, missing(AVX XOP FMA4 F16C FMA3 AVX2 AVX512F AVX512CD AVX512_KNL AVX512_KNM AVX512_SKX AVX512_CLX AVX512_CNL AVX512_ICL AVX512_SPR)
  Test features "SSE SSE2 SSE3 SSSE3 SSE41 POPCNT SSE42" : Supported
  Test features "AVX" : Supported
  Test features "F16C" : Supported
  Test features "FMA3" : Supported
  Test features "AVX2" : Supported
  Test features "AVX512F" : Supported
  Test features "AVX512CD" : Supported
  Test features "AVX512_KNL" : Supported
  Test features "AVX512_KNM" : Unsupported due to Arguments "-msse, -msse2, -msse3, -mssse3, -msse4.1, -mpopcnt, -msse4.2, -mavx, -mf16c, -mfma, -mavx2, -mno-mmx, -mavx512f, -mavx512cd, -mavx512er, -mavx512pf, -mavx5124fmaps, -mavx5124vnniw, -mavx512vpopcntdq" are not supported
  Test features "AVX512_SKX" : Supported
  Test features "AVX512_CLX" : Supported
  Test features "AVX512_CNL" : Supported
  Test features "AVX512_ICL" : Supported
  Test features "AVX512_SPR" : Unsupported due to Compiler fails against the test code of "AVX512_SPR"

What we need is the more detailed build log in the build directory. If you add -Cbuild-dir=build to the pip install command, then you'll find it at build/meson-logs/meson-log.txt. It will contain logging like this for each AVX flavor):

Command line: `arm64-apple-darwin20.0.0-clang /Users/rgommers/code/numpy/numpy/distutils/checks/cpu_asimd.c -o /var/folders/2m/dk_4hyc90xd2drkbqsbsjfqw0000gn/T/tmp3sauggrh/output.exe` -> 0
Test features "^[[1mNEON NEON_FP16 NEON_VFPV4 ASIMD^[[0m" : Supported
Using cached compile:
Cached command line:  arm64-apple-darwin20.0.0-clang /Users/rgommers/code/numpy/numpy/distutils/checks/cpu_asimd.c -o /var/folders/2m/dk_4hyc90xd2drkbqsbsjfqw0000gn/T/tmp3sauggrh/output.exe

Code:
 /Users/rgommers/code/numpy/numpy/distutils/checks/cpu_asimd.c
Cached compiler stdout:

Cached compiler stderr:

Using cached compile:
Cached command line:  arm64-apple-darwin20.0.0-clang /Users/rgommers/code/numpy/build/meson-private/tmpjma_l37t/testfile.c -o /Users/rgommers/code/numpy/build/meson-private/tmpjma_l37t/output.obj -c -ftree-vectorize -fPIC -fstack-protector-strong -O2 -pipe -isystem /Users/rgommers/mambaforge/envs/numpy-dev/include -D_FORTIFY_SOURCE=2 -isystem /Users/rgommers/mambaforge/envs/numpy-dev/include -O0 -Werror=implicit-function-declaration -Werror=unknown-warning-option -Werror=unused-command-line-argument -Werror=ignored-optimization-argument -march=armv8.2-a+fp16

Code:
 extern int i;
int i;

Cached compiler stdout:

Cached compiler stderr:

Running compile:
Working directory:  /var/folders/2m/dk_4hyc90xd2drkbqsbsjfqw0000gn/T/tmpk3fu2k0j
Source file: /Users/rgommers/code/numpy/numpy/distutils/checks/cpu_asimdhp.c

The code that is compiled to test support is under numpy.distutils/checks, e.g.: https://github.com/numpy/numpy/blob/768724556fd9a60556fea5203b1489bb51c507a5/numpy/distutils/checks/cpu_avx.c

@rgommers
Copy link
Member

ran the suggested command from numpy's official documentation, to manually disable the AVX features. The build appears to fail for the same reason (the related log is attached).

That just disabled AVX512, not all AVX features. If you want to bypass all SIMD checks and usage, try:

pip install . -Cbuild-dir=build -Csetup-args=-Ddisable-optimization=true

@rgommers rgommers added 32 - Installation Problems installing or compiling NumPy component: SIMD Issues in SIMD (fast instruction sets) code or machinery labels Jun 23, 2024
@emestee
Copy link

emestee commented Jun 25, 2024

Hello,

This also happens on OpenBSD 7.5 with a CPU that supports AVX2, so it may not be an issue with detection of CPU features but rather something about the OpenBSD environment itself. With -Ddisable-optimization=true the problem goes away. Meson log meson-log.txt

bash-5.2# sysctl hw|grep -i cpu
hw.model=Intel(R) Core(TM) i5-6300U CPU @ 2.40GHz

      Test features "SSE SSE2 SSE3" : Supported
      Test features "SSSE3" : Supported
      Test features "SSE41" : Supported
      Test features "POPCNT" : Supported
      Test features "SSE42" : Supported
      Test features "AVX" : Supported
      Test features "F16C" : Supported
      Test features "FMA3" : Supported
      Test features "AVX2" : Supported
      Test features "AVX512F" : Supported
      Test features "AVX512CD" : Supported
      Test features "AVX512_KNL" : Supported
      Test features "AVX512_KNM" : Unsupported due to Arguments "-msse, -msse2, -msse3, -mssse3, -msse4.1, -mpopcnt, -msse4.2, -mavx, -mf16c, -mfma, -mavx2, -mno-mmx, -mavx512f, -mavx512cd, -mavx512er, -mavx512pf, -mavx5124fmaps, -mavx5124vnniw, -mavx512vpopcntdq" are not supported
      Test features "AVX512_SKX" : Supported
      Test features "AVX512_CLX" : Supported
      Test features "AVX512_CNL" : Supported
      Test features "AVX512_ICL" : Supported
      Test features "AVX512_SPR" : Unsupported due to Compiler fails against the test code of "AVX512_SPR"

...

      FAILED: numpy/_core/libx86_simd_argsort.dispatch.h_AVX512_SKX.a.p/src_npysort_x86_simd_argsort.dispatch.cpp.o
      c++ -Inumpy/_core/libx86_simd_argsort.dispatch.h_AVX512_SKX.a.p -Inumpy/_core -I../numpy/_core -Inumpy/_core/include -I../numpy/_core/include -I../numpy/_core/src/common -I../numpy/_core/src/multiarray -I../numpy/_core/src/npymath -I../numpy/_core/src/umath -I../numpy/_core/src/highway -I/usr/local/include/python3.10 -I/home/emestee/dev/edgar/numpy/build/meson_cpu -fcolor-diagnostics -DNDEBUG -Wall -Winvalid-pch -std=c++17 -O3 -ftrapping-math -DNPY_HAVE_CLANG_FPSTRICT -msse -msse2 -msse3 -fPIC -DNPY_INTERNAL_BUILD -DHAVE_NPY_CONFIG_H -D_FILE_OFFSET_BITS=64 -D_LARGEFILE_SOURCE=1 -D_LARGEFILE64_SOURCE=1 -D__STDC_VERSION__=0 -fno-exceptions -fno-rtti -O3 -DNPY_HAVE_SSE2 -DNPY_HAVE_SSE -DNPY_HAVE_SSE3 -DNPY_HAVE_SSSE3 -DNPY_HAVE_SSE41 -DNPY_HAVE_POPCNT -DNPY_HAVE_SSE42 -DNPY_HAVE_AVX -DNPY_HAVE_F16C -DNPY_HAVE_FMA3 -DNPY_HAVE_AVX2 -DNPY_HAVE_AVX512F -DNPY_HAVE_AVX512F_REDUCE -DNPY_HAVE_AVX512CD -DNPY_HAVE_AVX512_SKX -DNPY_HAVE_AVX512VL -DNPY_HAVE_AVX512BW -DNPY_HAVE_AVX512DQ -DNPY_HAVE_AVX512BW_MASK -DNPY_HAVE_AVX512DQ_MASK -msse -msse2 -msse3 -mssse3 -msse4.1 -mpopcnt -msse4.2 -mavx -mf16c -mfma -mavx2 -mno-mmx -mavx512f -mavx512cd -mavx512vl -mavx512bw -mavx512dq -DNPY_MTARGETS_CURRENT=AVX512_SKX -MD -MQ numpy/_core/libx86_simd_argsort.dispatch.h_AVX512_SKX.a.p/src_npysort_x86_simd_argsort.dispatch.cpp.o -MF numpy/_core/libx86_simd_argsort.dispatch.h_AVX512_SKX.a.p/src_npysort_x86_simd_argsort.dispatch.cpp.o.d -o numpy/_core/libx86_simd_argsort.dispatch.h_AVX512_SKX.a.p/src_npysort_x86_simd_argsort.dispatch.cpp.o -c ../numpy/_core/src/npysort/x86_simd_argsort.dispatch.cpp
      In file included from ../numpy/_core/src/npysort/x86_simd_argsort.dispatch.cpp:5:
      In file included from ../numpy/_core/src/npysort/x86-simd-sort/src/avx512-64bit-argsort.hpp:11:
      In file included from ../numpy/_core/src/npysort/x86-simd-sort/src/xss-common-argsort.h:11:
      ../numpy/_core/src/npysort/x86-simd-sort/src/xss-network-keyvaluesort.hpp:593:40: error: implicit instantiation of undefined template 'zmm_vector<unsigned long>'
          static_assert(keyType::numlanes == indexType::numlanes,
                                             ^
      ../numpy/_core/src/npysort/x86-simd-sort/src/xss-common-argsort.h:525:9: note: in instantiation of function template specialization 'argsort_n<ymm_vector<int32_t>, zmm_vector<unsigned long>, 256>' requested here
              argsort_n<vtype, argtype, 256>(
              ^
      ../numpy/_core/src/npysort/x86-simd-sort/src/xss-common-argsort.h:642:9: note: in instantiation of function template specialization 'argselect_64bit_<ymm_vector<int32_t>, zmm_vector<unsigned long>, int>' requested here
              argselect_64bit_<vectype, argtype>(
              ^
      ../numpy/_core/src/npysort/x86_simd_argsort.dispatch.cpp:28:5: note: in instantiation of function template specialization 'avx512_argselect<int>' requested here
          avx512_argselect(arr, arg, kth, num, true);
          ^
      ../numpy/_core/src/npysort/x86_simd_argsort.dispatch.cpp:39:5: note: in instantiation of function template specialization '(anonymous namespace)::x86_argselect<int>' requested here
          x86_argselect(arr, reinterpret_cast<size_t*>(arg), kth, num);
          ^
      ../numpy/_core/src/npysort/x86-simd-sort/src/xss-common-includes.h:96:8: note: template is declared here
      struct zmm_vector;
             ^
      In file included from ../numpy/_core/src/npysort/x86_simd_argsort.dispatch.cpp:5:
      In file included from ../numpy/_core/src/npysort/x86-simd-sort/src/avx512-64bit-argsort.hpp:11:
      ../numpy/_core/src/npysort/x86-simd-sort/src/xss-common-argsort.h:324:31: error: implicit instantiation of undefined template 'zmm_vector<unsigned long>'
          using argreg_t = typename argtype::reg_t;
                                    ^

and so on.

@r-devulap
Copy link
Member

@emestee Your build failure is in the simd sort module and was recently reported by someone else too. See intel/x86-simd-sort#157. #26797 should fix it.

@r-devulap
Copy link
Member

Actually, looks like @AngeloD2022's bug report is the same error too. #26797 should close this issue.

@r-devulap r-devulap self-assigned this Jun 25, 2024
@r-devulap r-devulap linked a pull request Jun 25, 2024 that will close this issue
@AngeloD2022
Copy link
Author

Oof. Sorry I missed this. Glad to see there's a PR that might fix this!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
00 - Bug 32 - Installation Problems installing or compiling NumPy component: SIMD Issues in SIMD (fast instruction sets) code or machinery
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants