Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[python-package] [gpu] Unable to Install LightGBM GPU Python Package on Windows #6325

Closed
NisuSan opened this issue Feb 16, 2024 · 12 comments
Closed

Comments

@NisuSan
Copy link

NisuSan commented Feb 16, 2024

Issue Description:
I encountered difficulties while attempting to install the LightGBM GPU (master branch) Python package on Windows. Despite successfully compiling the GPU version and obtaining the necessary .dll and .exe files in the Release folder, I faced several obstacles during the installation process using the command pip install ./python-package.

Steps to Reproduce:

Compile LightGBM GPU (master branch) version on Windows.
Check Release folder for containing the .dll and .exe files.
Execute the command pip install ./python-package from root folder (LightGBM)

Expected Behavior:
The Python package installation process should proceed smoothly without any errors.

Actual Behavior:
Encountered errors during the installation process:

Initially failed to locate the LICENSE file within the 'python-package' folder. Manually creating the LICENSE file resolved this issue.
Subsequently, encountered an error indicating the absence of the 'CMakeLists.txt' file.

Additional Information:

OS: Windows
Compiler: cmake
Python version: 3.11.5
LightGBM version: folder cloned from master branch usning git clone --recursive https://github.com/microsoft/LightGBM

Proposed Solution:
Investigate and resolve the issues preventing successful installation of the LightGBM GPU Python package on Windows. This may involve ensuring all required files are present and addressing any potential compatibility issues.

Thank you for your attention to this matter. If further information or logs are required, please let me know.

@jameslamb jameslamb changed the title Unable to Install LightGBM GPU Python Package on Windows [python-package] [gpu] Unable to Install LightGBM GPU Python Package on Windows Feb 16, 2024
@jameslamb
Copy link
Collaborator

Thanks for using LightGBM and for the thorough write-up!

As explained in the documentation, you cannot simple pip install ./python-package in this repo. Building the Python package from source is driven by a shell script.

To build the GPU Python package from GitHub sources, do the following:

git clone --recursive  https://github.com/microsoft/LightGBM
cd ./LightGBM
sh build-python.sh install --gpu

But only do that if you need an unreleased version of lightgbm. If you are ok using a released version, install from PyPI... the Windows wheels we distributed have the OpenCL-based GPU support (not CUDA) already compiled in.

pip install lightgbm

For more details, see https://stackoverflow.com/a/77078844/3986677

@NisuSan
Copy link
Author

NisuSan commented Feb 17, 2024

I tried to run pip install \ --force-reinstall \ --no-binary lightgbm \ --config-settings=cmake.define.USE_CUDA=ON \ lightgbm according to https://stackoverflow.com/a/77078844/3986677 and got error "ERROR: Failed building wheel for lightgbm. ERROR: Could not build wheels for lightgbm, which is required to install pyproject.toml-based projects".

After that I tried to use simplified version of command and just run the pip install lightgbm --config-settings=cmake.define.USE_CUDA=ON and packege installed well, but when I tried to set { 'device_type': 'cuda' } in my script, I got error: "Trial 0 failed with parameters: {'feature_fraction': 0.6} because of the following error: LightGBMError('CUDA Tree Learner was not enabled in this build.\nPlease recompile with CMake option -DUSE_CUDA=1')"

UPD
I tried to install package from local repo using sh build-python.sh install --gpu and it works, but only with { 'device_type': 'gpu' }, not "cuda". What exactly difference between this two options?

UPD 2
I tried sh build-python.sh install --cuda too and its failed with "CMake build failed ERROR Backend subprocess exited when trying to invoke build_wheel"

@jameslamb
Copy link
Collaborator

got error "ERROR: Failed building wheel for lightgbm. ERROR: Could not build wheels for lightgbm, which is required to install pyproject.toml-based projects".

That error has many possible causes. I strongly suspect that there were more logs than just that printed, which might help us to help you identify the root cause.

Can you please run this again:

pip install \
    --force-reinstall \
    --no-binary lightgbm \
    --config-settings=cmake.define.USE_CUDA=ON \
    lightgbm

And share the full output that's printed?


What exactly difference between this two options?

  • GPU (-DUSE_GPU=ON or --gpu) = OpenCL-based GPU-accelerated version of LightGBM. Use this for non-NVIDIA GPUs.
  • CUDA (-DUSE_CUDA=ON or --cuda) = CUDA-based GPU-accelerated version of LightGBM. Use this for NVIDIA GPUs.

See https://lightgbm.readthedocs.io/en/latest/Installation-Guide.html#build-cuda-version for more information.


In case you're new to GitHub... please see https://docs.github.com/en/get-started/writing-on-github/getting-started-with-writing-and-formatting-on-github/basic-writing-and-formatting-syntax for some tips on how to format text here in a way that makes the difference between code, output from code, and your own words clearer.

@jameslamb
Copy link
Collaborator

More information: #6281 (comment)

@NisuSan
Copy link
Author

NisuSan commented Feb 18, 2024

And share the full output that's printed?

Sure,
lightgbm.log

@NisuSan
Copy link
Author

NisuSan commented Feb 18, 2024

@jameslamb Ok, now it's real interesting, because I tried to use docker image from here and got the error too! I create the preproduction repo and describe the steps I did. Hope this stuff helps you understand why the problem appears.

@jameslamb
Copy link
Collaborator

lightgbm.log

Thank you.

I see compilation errors like this:

"C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v12.3\bin\nvcc.exe"  --use-local-env -ccbin "C:\Program Files (x86)\Microsoft Visual Studio\2022\BuildTools\VC\Tools\MSVC\14.39.33519\bin\HostX64\x64" -x cu -rdc=true  -I"C:\Users\Antony\AppData\Local\Temp\pip-install-1rpnm3ee\lightgbm_1ddc53f59bc64e7d810b3ed1e35f19a2\external_libs\eigen" -I"C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v12.3\include" -I"C:\Users\Antony\AppData\Local\Temp\pip-install-1rpnm3ee\lightgbm_1ddc53f59bc64e7d810b3ed1e35f19a2\include" -I"C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v12.3\include"     --keep-dir lightgbm_objs\x64\Release  -maxrregcount=0   --machine 64 --compile -cudart static -gencode arch=compute_60,code=sm_60 -gencode arch=compute_61,code=sm_61 -gencode arch=compute_62,code=sm_62 -gencode arch=compute_70,code=sm_70 -gencode arch=compute_75,code=sm_75 -gencode arch=compute_80,code=sm_80 -gencode arch=compute_86,code=sm_86 -gencode arch=compute_87,code=sm_87 -gencode arch=compute_89,code=sm_89 -gencode arch=compute_90,code=sm_90 -gencode arch=compute_90,code=compute_90 -O3 -lineinfo -Xcompiler="/EHsc -openmp -fPIC -Ob2"   -D_WINDOWS -DNDEBUG -DEIGEN_MPL2_ONLY -DEIGEN_DONT_PARALLELIZE -DUSE_SOCKET -DUSE_CUDA -DWIN_HAS_INET_PTON -D"CMAKE_INTDIR=\"Release\"" -D_MBCS -DWIN32 -D_WINDOWS -DNDEBUG -DEIGEN_MPL2_ONLY -DEIGEN_DONT_PARALLELIZE -DUSE_SOCKET -DUSE_CUDA -DWIN_HAS_INET_PTON -D"CMAKE_INTDIR=\"Release\"" -Xcompiler "/EHsc /Wall /nologo /O2 /FS   /MD /GR" -Xcompiler "/Fdlightgbm_objs.dir\Release\lightgbm_objs.pdb" -o lightgbm_objs.dir\Release\/src/treelearner/cuda/cuda_best_split_finder.cu.obj "C:\Users\Antony\AppData\Local\Temp\pip-install-1rpnm3ee\lightgbm_1ddc53f59bc64e7d810b3ed1e35f19a2\src\treelearner\cuda\cuda_best_split_finder.cu"

cl : Command line warning D9002: ignoring unknown option '-fPIC' [C:\Users\Antony\AppData\Local\Temp\tmpnc1sf10k\build\lightgbm_objs.vcxproj]

C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v12.3\include\crt/host_config.h(104): warning C4668: '__NV_NO_HOST_COMPILER_CHECK' is not defined as a preprocessor macro, replacing with '0' for '#if/#elif' 

C:/Program Files/NVIDIA GPU Computing Toolkit/CUDA/v12.3/include\cuda.h(3180): warning C4668: '__STDC_VERSION__' is not defined as a preprocessor macro, replacing with '0' for '#if/#elif' 

C:/Users/Antony/AppData/Local/Temp/pip-install-1rpnm3ee/lightgbm_1ddc53f59bc64e7d810b3ed1e35f19a2/include\LightGBM/utils/common.h(33): warning C4464: relative include path contains '..' [C:\Users\Antony\AppData\Local\Temp\tmpnc1sf10k\build\lightgbm_objs.vcxproj]
C:/Users/Antony/AppData/Local/Temp/pip-install-1rpnm3ee/lightgbm_1ddc53f59bc64e7d810b3ed1e35f19a2/include\LightGBM/utils/common.h(34): warning C4464: relative include path contains '..' [C:\Users\Antony\AppData\Local\Temp\tmpnc1sf10k\build\lightgbm_objs.vcxproj]

cl : Command line warning D9002: ignoring unknown option '-fPIC' 

...

C:\Users\Antony\AppData\Local\Temp\pip-install-1rpnm3ee\lightgbm_1ddc53f59bc64e7d810b3ed1e35f19a2\src\treelearner\cuda\cuda_best_split_finder.cu(1937): error : identifier "LightGBM::kMinScore" is undefined in device code 

C:\Users\Antony\AppData\Local\Temp\pip-install-1rpnm3ee\lightgbm_1ddc53f59bc64e7d810b3ed1e35f19a2\src\treelearner\cuda\cuda_best_split_finder.cu(1966): error : identifier "LightGBM::kMinScore" is undefined in device code 

... dozens more like that ...

Error limit reached.
100 errors detected in the compilation of "C:/Users/Antony/AppData/Local/Temp/pip-install-1rpnm3ee/lightgbm_1ddc53f59bc64e7d810b3ed1e35f19a2/src/treelearner/cuda/cuda_best_split_finder.cu".
Compilation terminated.

So just to confirm... that output came from running precisely this command, with no other customizations?

pip install \
    --force-reinstall \
    --no-binary lightgbm \
    --config-settings=cmake.define.USE_CUDA=ON \
    lightgbm

the preproduction repo and describe the steps I did

The error message you're reporting there is this:

"[LightGBM] [Fatal] CUDA Tree Learner was not enabled in this build. Please recompile with CMake option -DUSE_CUDA=1"

And you did not compile the library with -DUSE_CUDA=1.

cmake -DUSE_GPU=1 -DOpenCL_LIBRARY=/usr/local/cuda/lib64/libOpenCL.so -DOpenCL_INCLUDE_DIR=/usr/local/cuda/include/ .. 

If you want to use {"device": "cuda"}, you have to compile the library with -DUSE_CUDA=1, exactly as that message says.

@NisuSan
Copy link
Author

NisuSan commented Feb 18, 2024

So just to confirm... that output came from running precisely this command, with no other customizations?

Yes, no customizations.

And you did not compile the library with -DUSE_CUDA=1

Oh, I see now.

@NisuSan
Copy link
Author

NisuSan commented Feb 18, 2024

@jameslamb , Finally I compiled the module for CUDA using, but now I got error

LightGBMError: Check failed: (split_indices_block_size_data_partition) > (0) at /usr/local/src/lightgbm/LightGBM/lightgbm-python/src/treelearner/cuda/cuda_data_partition.cpp, line 280 .

I don't google any information about this error..

@jameslamb
Copy link
Collaborator

What specific command(s) did you run or other actions did you take to fix the compilation errors?

@NisuSan
Copy link
Author

NisuSan commented Feb 19, 2024

What specific command(s) did you run or other actions did you take to fix the compilation errors?

I did it for Docker, not Windows.

  1. Replace -DUSE_GPU=1 by -DUSE_CUDA=1
  2. Replace ./build-python.sh install --precompile by ./build-python.sh install --cuda

@jameslamb
Copy link
Collaborator

Ok. Well it looks like you've opened another issue for the new error message you're reporting (#6329), and the documentation here does explicitly say that Windows support for the CUDA interface is not currently available:

Note: only Linux is supported, other operating systems are not supported yet.

ref: https://lightgbm.readthedocs.io/en/latest/Installation-Guide.html#build-cuda-version

So as it seems you're not interested in continuing to help with identifying the root cause of these issues on Windows, we'll close this.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants