-
Notifications
You must be signed in to change notification settings - Fork 3.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[CLI, GPU, Win x64] LightGBM GPU doesn't work for 100K+ features --> Met Exceptions: Invalid Kernel Arguments #6220
Comments
Hello, Here is an update. I was able to run further testing. Working: I'm able to run the GPU on a 800GB+ bin file
Not working: But when I try on a smaller bin file (~350GB) but with more features, then it fails
Would a bigger GPU help? Note: both work on CPU Thanks for your help Wil |
It seems this person is getting the same issue as I have "Invalid Kernel Arguments" when he reached 100K features. I'm building lightgbm gpu on windows 10 pro with the "LightGBM\build" with the cmake instructions. Is there a way to start this build on vs2022 and to start the debuger so I can see what trigger this "Invalid Kernel Arguments" ? |
@wil70 The |
Hi @shiyu1994 , I'm trying to compile with cuda (-DUSE_CUDA=1) instead of the gpu version (-DUSE_GPU=1) 'C:\lightgbm>"c:\Program Files\cmake\bin\cmake" -A x64 -DUSE_CUDA=1 -DBOOST_ROOT=c:\Home\Wilhelm\dev\GPUl\boost_1_83_0 -DBOOST_LIBRARYDIR=c:\Home\Wilhelm\dev\GPUl\boost_1_83_0\lib64-msvc-14.3 -DOpenCL_LIBRARY="c:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v12.3\lib\x64\OpenCL.lib" -DOpenCL_INCLUDE_DIR="c:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v12.3\include" ..`
I will try WSL2 but I'm reading there are GPU support contraints. |
I, too, am running into this issue on a dataset that's (84, 123434) in shape. I can use this data and model on CPU mode, but not GPU. Has there been any explanation on why this error is generated for datasets with large features (such as 100k+ features)? I tried changing the dtype to one that uses less memory, although I do not believe I am explicitly running into memory issues here (that I'm aware of). Any context would be appreciated! |
Same here. my data has 136200 columns and CPU works well while GPU goes into fail when the data has 27+ rows. up to 26 rows works fine with GPU. error message is "Error: Error { code: Some(-1), message: "Invalid Kernel Arguments" }". GPU memory is enough and even it seems that it is not loaded GPU memory when i tracked it GPU memory usage. As it occurs repeatedly after the message "[LightGBM] [Info] Increasing preallocd_max_num_wg_ to 34050 for launching more workgroups", i guess some of arguments for rearranging workgroup have a limitation. |
Description
Hello,
I followed the instructions to compile LightGBM with GPU for Windows (x64), and I was able to compile and run the code. TY!
Unfortunatly I encountered the following exception "Met Exceptions: Invalid Kernel Arguments"
I created 2 bin input files to for lightgbm for training and validation (file size: ~100MB each with max_bin=15)
Command output for a small test dataset:
Command output for a smaller test dataset:
Any idea how to solve this to move forward with the GPU?
Thanks for your help
Wil
Reproducible example
Create a csv file with random doubles (1K rows and 100K columns).
Create conf file and start LightGBM that has been compiled for GPU.
c:\>lightgbm.exe config=trainGPU.conf
trainGPU.conf:
Environment info
LightGBM version or commit hash: SHA-1: 5083df1
Command(s) you used to install LightGBM
Win 10 PRO + Vs 2022, all latest update installed.
Additional Comments
There is no issue when compiled and run for CPU.
My compiled GPU version works on small dataset (less than 40 columns with few thousands rows) but fail with bigger datasets
The text was updated successfully, but these errors were encountered: