Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[gpu] /root/repo/LightGBM/compute/include/boost/compute/utility/wait_list.hpp:166: void boost::compute::wait_list::wait() const: Assertion `clWaitForEvents(size(), get_event_ptr()) == 0' failed. Aborted (core dumped) #2648

Open
pseudotensor opened this issue Dec 24, 2019 · 13 comments
Labels

Comments

@pseudotensor
Copy link

Operating System: Ubuntu 16.04 LTS

CPU/GPU model: Xeon / 1080ti

C++/Python/R version: 3.6.6

LightGBM version or commit hash: 2.2.4

lgbm_waitforfailure.zip

2.2.4
python lgbm_waitforfailure.py: /root/repo/LightGBM/compute/include/boost/compute/utility/wait_list.hpp:166: void boost::compute::wait_list::wait() const: Assertion `clWaitForEvents(size(), get_event_ptr()) == 0' failed.
Aborted (core dumped)
#0  __GI_raise (sig=sig@entry=6) at ../sysdeps/unix/sysv/linux/raise.c:51
#1  0x00007f70f267d801 in __GI_abort () at abort.c:79
#2  0x00007f70f266d39a in __assert_fail_base (fmt=0x7f70f27f47d8 "%s%s%s:%u: %s%sAssertion `%s' failed.\n%n", assertion=assertion@entry=0x7f7079221aa0 "clWaitForEvents(size(), get_event_ptr()) == 0", file=file@entry=0x7f7079221a58 "/root/repo/LightGBM/compute/include/boost/compute/utility/wait_list.hpp", line=line@entry=166, function=function@entry=0x7f707923d800 <boost::compute::wait_list::wait() const::__PRETTY_FUNCTION__> "void boost::compute::wait_list::wait() const") at assert.c:92
#3  0x00007f70f266d412 in __GI___assert_fail (assertion=0x7f7079221aa0 "clWaitForEvents(size(), get_event_ptr()) == 0", file=0x7f7079221a58 "/root/repo/LightGBM/compute/include/boost/compute/utility/wait_list.hpp", line=166, function=0x7f707923d800 <boost::compute::wait_list::wait() const::__PRETTY_FUNCTION__> "void boost::compute::wait_list::wait() const") at assert.c:101
#4  0x00007f7078fe4446 in boost::compute::wait_list::wait() const [clone .part.134] () from /home/jon/.pyenv/versions/3.6.6/lib/python3.6/site-packages/lightgbm_gpu/lib_lightgbm.so
#5  0x00007f70791d52ac in void LightGBM::GPUTreeLearner::WaitAndGetHistograms<LightGBM::HistogramBinEntry>(LightGBM::HistogramBinEntry*) () from /home/jon/.pyenv/versions/3.6.6/lib/python3.6/site-packages/lightgbm_gpu/lib_lightgbm.so
#6  0x00007f70791cadd5 in LightGBM::GPUTreeLearner::ConstructHistograms(std::vector<signed char, std::allocator<signed char> > const&, bool) () from /home/jon/.pyenv/versions/3.6.6/lib/python3.6/site-packages/lightgbm_gpu/lib_lightgbm.so
#7  0x00007f70791e50ee in LightGBM::SerialTreeLearner::FindBestSplits() () from /home/jon/.pyenv/versions/3.6.6/lib/python3.6/site-packages/lightgbm_gpu/lib_lightgbm.so
#8  0x00007f70791e6e37 in LightGBM::SerialTreeLearner::Train(float const*, float const*, bool, json11::Json&) () from /home/jon/.pyenv/versions/3.6.6/lib/python3.6/site-packages/lightgbm_gpu/lib_lightgbm.so
#9  0x00007f707904a7c5 in LightGBM::GBDT::TrainOneIter(float const*, float const*) () from /home/jon/.pyenv/versions/3.6.6/lib/python3.6/site-packages/lightgbm_gpu/lib_lightgbm.so
#10 0x00007f7078fed41e in LGBM_BoosterUpdateOneIter () from /home/jon/.pyenv/versions/3.6.6/lib/python3.6/site-packages/lightgbm_gpu/lib_lightgbm.so
#11 0x00007f70c8554dae in ffi_call_unix64 () from /usr/lib/x86_64-linux-gnu/libffi.so.6
#12 0x00007f70c855471f in ffi_call () from /usr/lib/x86_64-linux-gnu/libffi.so.6
#13 0x00007f70c8768d5d in _call_function_pointer (argcount=2, resmem=0x7fff29cd4f70, restype=<optimized out>, atypes=<optimized out>, avalues=0x7fff29cd4f50, pProc=0x7f7078fed3e0 <LGBM_BoosterUpdateOneIter>, flags=4353) at /tmp/python-build.20190530091007.28149/Python-3.6.6/Modules/_ctypes/callproc.c:809
#14 _ctypes_callproc (pProc=pProc@entry=0x7f7078fed3e0 <LGBM_BoosterUpdateOneIter>, argtuple=argtuple@entry=0x7f70796c7748, flags=4353, argtypes=argtypes@entry=0x0, restype=0x561d5fdafe98, checker=0x0) at /tmp/python-build.20190530091007.28149/Python-3.6.6/Modules/_ctypes/callproc.c:1166
#15 0x00007f70c875fae7 in PyCFuncPtr_call (self=self@entry=0x7f70796d1c00, inargs=inargs@entry=0x7f70796c7748, kwds=kwds@entry=0x0) at /tmp/python-build.20190530091007.28149/Python-3.6.6/Modules/_ctypes/_ctypes.c:3962
#16 0x00007f70f2aa3189 in _PyObject_FastCallDict (func=0x7f70796d1c00, args=<optimized out>, nargs=<optimized out>, kwargs=kwargs@entry=0x0) at Objects/abstract.c:2331
#17 0x00007f70f2aa3621 in _PyObject_FastCallKeywords (func=func@entry=0x7f70796d1c00, stack=stack@entry=0x561d62f90da0, nargs=nargs@entry=2, kwnames=kwnames@entry=0x0) at Objects/abstract.c:2496
#18 0x00007f70f2b9bb69 in call_function (pp_stack=pp_stack@entry=0x7fff29cd5298, oparg=<optimized out>, kwnames=kwnames@entry=0x0) at Python/ceval.c:4854
#19 0x00007f70f2ba0ec1 in _PyEval_EvalFrameDefault (f=<optimized out>, throwflag=<optimized out>) at Python/ceval.c:3328
#20 0x00007f70f2b9ba0a in _PyEval_EvalCodeWithName (_co=0x7f70d447ca50, globals=<optimized out>, locals=locals@entry=0x0, args=args@entry=0x561d616851a8, argcount=argcount@entry=1, kwnames=0x7f70d448a4f8, kwargs=0x561d616851b0, kwcount=1, kwstep=1, defs=0x7f70d44859e0, defcount=2, kwdefs=0x0, closure=0x0, name=0x7f70f31395a8, qualname=0x7f70d447d3f0) at Python/ceval.c:4159
#21 0x00007f70f2b9bca2 in fast_function (kwnames=0x7f70d448a4e0, nargs=1, stack=0x561d616851a8, func=0x7f70796b2950) at Python/ceval.c:4971
#22 call_function (pp_stack=pp_stack@entry=0x7fff29cd5540, oparg=<optimized out>, kwnames=kwnames@entry=0x7f70d448a4e0) at Python/ceval.c:4851
#23 0x00007f70f2ba0f41 in _PyEval_EvalFrameDefault (f=<optimized out>, throwflag=<optimized out>) at Python/ceval.c:3344
#24 0x00007f70f2b9ba0a in _PyEval_EvalCodeWithName (_co=0x7f70796a95d0, globals=<optimized out>, locals=locals@entry=0x0, args=args@entry=0x561d60799ba0, argcount=argcount@entry=3, kwnames=0x7f70d4483160, kwargs=0x561d60799bb8, kwcount=10, kwstep=1, defs=0x7f70796ba060, defcount=14, kwdefs=0x0, closure=0x0, name=0x7f70ae073b20, qualname=0x7f70ae073b20) at Python/ceval.c:4159
#25 0x00007f70f2b9bca2 in fast_function (kwnames=0x7f70d4483148, nargs=3, stack=0x561d60799ba0, func=0x7f70796b7158) at Python/ceval.c:4971
#26 call_function (pp_stack=pp_stack@entry=0x7fff29cd57e0, oparg=<optimized out>, kwnames=kwnames@entry=0x7f70d4483148) at Python/ceval.c:4851
#27 0x00007f70f2ba0f41 in _PyEval_EvalFrameDefault (f=<optimized out>, throwflag=<optimized out>) at Python/ceval.c:3344
#28 0x00007f70f2b9ba0a in _PyEval_EvalCodeWithName (_co=0x7f70796ce8a0, globals=<optimized out>, locals=locals@entry=0x0, args=args@entry=0x561d61676388, argcount=argcount@entry=3, kwnames=0x7f7079fdc5b8, kwargs=0x561d616763a0, kwcount=13, kwstep=1, defs=0x7f70d44673a8, defcount=15, kwdefs=0x0, closure=0x0, name=0x7f70f0faeb20, qualname=0x7f70796be270) at Python/ceval.c:4159
#29 0x00007f70f2b9bca2 in fast_function (kwnames=0x7f7079fdc5a0, nargs=3, stack=0x561d61676388, func=0x7f70796b7bf8) at Python/ceval.c:4971
#30 call_function (pp_stack=pp_stack@entry=0x7fff29cd5a80, oparg=<optimized out>, kwnames=kwnames@entry=0x7f7079fdc5a0) at Python/ceval.c:4851
#31 0x00007f70f2ba0f41 in _PyEval_EvalFrameDefault (f=<optimized out>, throwflag=<optimized out>) at Python/ceval.c:3344
#32 0x00007f70f2b9ba0a in _PyEval_EvalCodeWithName (_co=0x7f70796c3030, globals=<optimized out>, locals=locals@entry=0x0, args=args@entry=0x561d5fcb93c0, argcount=argcount@entry=3, kwnames=0x7f70f12796e0, kwargs=0x561d5fcb93d8, kwcount=10, kwstep=1, defs=0x7f7079fdc650, defcount=13, kwdefs=0x0, closure=0x7f70796bb9e8, name=0x7f70f0faeb20, qualname=0x7f70796c10c0) at Python/ceval.c:4159
#33 0x00007f70f2b9bca2 in fast_function (kwnames=0x7f70f12796c8, nargs=3, stack=0x561d5fcb93c0, func=0x7f70796c41e0) at Python/ceval.c:4971
#34 call_function (pp_stack=pp_stack@entry=0x7fff29cd5d20, oparg=<optimized out>, kwnames=kwnames@entry=0x7f70f12796c8) at Python/ceval.c:4851
#35 0x00007f70f2ba0f41 in _PyEval_EvalFrameDefault (f=<optimized out>, throwflag=<optimized out>) at Python/ceval.c:3344
#36 0x00007f70f2b9ba0a in _PyEval_EvalCodeWithName (_co=_co@entry=0x7f70f302eae0, globals=globals@entry=0x7f70f30e01b0, locals=locals@entry=0x7f70f30e01b0, args=args@entry=0x0, argcount=argcount@entry=0, kwnames=kwnames@entry=0x0, kwargs=0x0, kwcount=0, kwstep=2, defs=0x0, defcount=0, kwdefs=0x0, closure=0x0, name=0x0, qualname=0x0) at Python/ceval.c:4159
#37 0x00007f70f2b9c06e in PyEval_EvalCodeEx (_co=_co@entry=0x7f70f302eae0, globals=globals@entry=0x7f70f30e01b0, locals=locals@entry=0x7f70f30e01b0, args=args@entry=0x0, argcount=argcount@entry=0, kws=kws@entry=0x0, kwcount=0, defs=0x0, defcount=0, kwdefs=0x0, closure=0x0) at Python/ceval.c:4180
#38 0x00007f70f2b9c09b in PyEval_EvalCode (co=co@entry=0x7f70f302eae0, globals=globals@entry=0x7f70f30e01b0, locals=locals@entry=0x7f70f30e01b0) at Python/ceval.c:731
#39 0x00007f70f2bd23da in run_mod (arena=0x7f70f30fb078, flags=0x7fff29cd603c, locals=0x7f70f30e01b0, globals=0x7f70f30e01b0, filename=0x7f70f0fc8920, mod=0x561d5fd2c248) at Python/pythonrun.c:1025
#40 PyRun_FileExFlags (fp=fp@entry=0x561d5fca55a0, filename_str=filename_str@entry=0x7f70f1292470 "lgb_prefit_1a802b24-2bff-488d-8a72-142dff84ba18.py", start=start@entry=257, globals=globals@entry=0x7f70f30e01b0, locals=locals@entry=0x7f70f30e01b0, closeit=closeit@entry=1, flags=0x7fff29cd603c) at Python/pythonrun.c:978
#41 0x00007f70f2bd254d in PyRun_SimpleFileExFlags (fp=fp@entry=0x561d5fca55a0, filename=<optimized out>, closeit=closeit@entry=1, flags=flags@entry=0x7fff29cd603c) at Python/pythonrun.c:420
#42 0x00007f70f2bd2943 in PyRun_AnyFileExFlags (fp=fp@entry=0x561d5fca55a0, filename=<optimized out>, closeit=closeit@entry=1, flags=flags@entry=0x7fff29cd603c) at Python/pythonrun.c:81
#43 0x00007f70f2bf0b2e in run_file (p_cf=0x7fff29cd603c, filename=0x561d5fc5d6c0 L"lgb_prefit_1a802b24-2bff-488d-8a72-142dff84ba18.py", fp=0x561d5fca55a0) at Modules/main.c:340
#44 Py_Main (argc=<optimized out>, argv=<optimized out>) at Modules/main.c:810
#45 0x0000561d5ea54b40 in main (argc=2, argv=<optimized out>) at ./Programs/python.c:69

@guolinke
Copy link
Collaborator

guolinke commented Jan 2, 2020

ping @huanzhang12

@guolinke guolinke changed the title /root/repo/LightGBM/compute/include/boost/compute/utility/wait_list.hpp:166: void boost::compute::wait_list::wait() const: Assertion `clWaitForEvents(size(), get_event_ptr()) == 0' failed. Aborted (core dumped) [gpu] /root/repo/LightGBM/compute/include/boost/compute/utility/wait_list.hpp:166: void boost::compute::wait_list::wait() const: Assertion `clWaitForEvents(size(), get_event_ptr()) == 0' failed. Aborted (core dumped) Jan 2, 2020
@StrikerRUS
Copy link
Collaborator

@pseudotensor I cannot reproduce the issue in the following environment with the following versions. Can you please try the latest version of LightGBM?

Operating System: Windows Server 2016 Datacenter 1607
CPU/GPU model: Xeon E5-2690 v3 / Tesla K80
NVIDIA Driver version: 386.45
Python version: 3.5.6
Boost: boost_1_72_0-msvc-14.1-64
LightGBM version or commit hash: edb9149 (2.3.2)

(add verbosity and predict at the end)

@@ -4,6 +4,7 @@ import numpy as np
 import lightgbm as lgb
 print(lgb.__version__)
 model_orig, X, y = pickle.load(open("lgbm_waitforfailure.pkl", "rb"))
-p = {'boosting_type': 'gbdt', 'class_weight': None, 'colsample_bytree': 0.8, 'importance_type': 'gain', 'learning_rate': 0.25, 'max_depth': 6, 'min_child_samples': 1, 'min_child_weight': 1, 'min_split_gain': 0.0, 'n_estimators': 1800, 'n_jobs': 4, 'num_leaves': 64, 'objective': 'multiclass', 'random_state': None, 'reg_alpha': 0.0, 'reg_lambda': 1.0, 'silent': True, 'subsample': 0.7, 'subsample_for_bin': 200000, 'subsample_freq': 1, 'num_class': 4, 'max_bin': 255, 'scale_pos_weight': 1, 'max_delta_step': 0, 'min_data_in_bin': 1, 'seed': 12345, 'device_type': 'gpu', 'gpu_device_id': 0, 'gpu_platform_id': 0, 'gpu_use_dp': True, 'feature_fraction_seed': 12346, 'bagging_seed': 12347, 'verbose': -1}
+p = {'boosting_type': 'gbdt', 'class_weight': None, 'colsample_bytree': 0.8, 'importance_type': 'gain', 'learning_rate': 0.25, 'max_depth': 6, 'min_child_samples': 1, 'min_child_weight': 1, 'min_split_gain': 0.0, 'n_estimators': 1800, 'n_jobs': 4, 'num_leaves': 64, 'objective': 'multiclass', 'random_state': None, 'reg_alpha': 0.0, 'reg_lambda': 1.0, 'silent': False, 'subsample': 0.7, 'subsample_for_bin': 200000, 'subsample_freq': 1, 'num_class': 4, 'max_bin': 255, 'scale_pos_weight': 1, 'max_delta_step': 0, 'min_data_in_bin': 1, 'seed': 12345, 'device_type': 'gpu', 'gpu_device_id': 0, 'gpu_platform_id': 0, 'gpu_use_dp': True, 'feature_fraction_seed': 12346, 'bagging_seed': 12347, 'verbose': 5}
 model = lgb.LGBMClassifier(**p)
 model.fit(X, y)
+print(model.predict(X))
2.3.2
[LightGBM] [Info] This is the GPU trainer!!
[LightGBM] [Info] Total Bins 5184
[LightGBM] [Info] Number of data points in the train set: 165, number of used features: 169
[LightGBM] [Info] Using requested OpenCL platform 0 device 0
[LightGBM] [Info] Using GPU Device: Tesla K80, Vendor: NVIDIA Corporation
[LightGBM] [Info] Compiling OpenCL Kernel with 256 bins...
[LightGBM] [Info] GPU programs have been built
[LightGBM] [Info] Size of histogram bin entry: 24
[LightGBM] [Info] 163 dense feature groups (0.03 MB) transferred to GPU in 0.003254 secs. 5 sparse feature groups
[LightGBM] [Debug] Use subset for bagging
[LightGBM] [Info] Start training from score -2.333357
[LightGBM] [Info] Start training from score -1.344745
[LightGBM] [Info] Start training from score -2.272732
[LightGBM] [Info] Start training from score -0.617309
[LightGBM] [Debug] Re-bagging, using 115 data to train
[LightGBM] [Info] Size of histogram bin entry: 24
[LightGBM] [Info] 163 dense feature groups (0.02 MB) transferred to GPU in 0.003187 secs. 5 sparse feature groups
[LightGBM] [Warning] No further splits with positive gain, best gain: -inf
[LightGBM] [Debug] Trained a tree with leaves = 2 and max_depth = 1
[LightGBM] [Warning] No further splits with positive gain, best gain: -inf
[LightGBM] [Debug] Trained a tree with leaves = 10 and max_depth = 5
[LightGBM] [Warning] No further splits with positive gain, best gain: -inf
[LightGBM] [Debug] Trained a tree with leaves = 7 and max_depth = 4
[LightGBM] [Warning] No further splits with positive gain, best gain: -inf
[LightGBM] [Debug] Trained a tree with leaves = 14 and max_depth = 6
[LightGBM] [Debug] Re-bagging, using 115 data to train
[LightGBM] [Info] Size of histogram bin entry: 24
[LightGBM] [Info] 163 dense feature groups (0.02 MB) transferred to GPU in 0.003342 secs. 5 sparse feature groups
[LightGBM] [Warning] No further splits with positive gain, best gain: -inf
[LightGBM] [Debug] Trained a tree with leaves = 2 and max_depth = 1
[LightGBM] [Warning] No further splits with positive gain, best gain: -inf
[LightGBM] [Debug] Trained a tree with leaves = 11 and max_depth = 5
[LightGBM] [Warning] No further splits with positive gain, best gain: -inf
[LightGBM] [Debug] Trained a tree with leaves = 7 and max_depth = 4
[LightGBM] [Warning] No further splits with positive gain, best gain: -inf
[LightGBM] [Debug] Trained a tree with leaves = 16 and max_depth = 6

...

[LightGBM] [Debug] Re-bagging, using 115 data to train
[LightGBM] [Info] Size of histogram bin entry: 24
[LightGBM] [Info] 163 dense feature groups (0.02 MB) transferred to GPU in 0.003421 secs. 5 sparse feature groups
[LightGBM] [Warning] No further splits with positive gain, best gain: -inf
[LightGBM] [Debug] Trained a tree with leaves = 1 and max_depth = 1
[LightGBM] [Warning] No further splits with positive gain, best gain: -inf
[LightGBM] [Debug] Trained a tree with leaves = 1 and max_depth = 1
[LightGBM] [Warning] No further splits with positive gain, best gain: -inf
[LightGBM] [Debug] Trained a tree with leaves = 1 and max_depth = 1
[LightGBM] [Warning] No further splits with positive gain, best gain: -inf
[LightGBM] [Debug] Trained a tree with leaves = 1 and max_depth = 1
[LightGBM] [Warning] Stopped training because there are no more leaves that meet the split requirements
[3 3 3 3 1 1 3 3 1 1 1 2 3 1 2 2 3 2 3 3 3 3 3 1 1 3 3 3 3 2 3 2 3 3 3 3 1
 3 1 3 3 3 1 3 1 3 2 3 3 3 1 3 3 3 3 2 1 3 3 2 3 3 3 1 3 0 3 1 3 2 3 1 1 1
 3 1 1 3 1 3 1 3 3 3 1 3 3 1 2 3 3 3 0 0 1 3 0 3 3 3 3 1 1 0 3 3 3 0 1 1 3
 0 0 1 0 1 1 1 0 3 1 3 1 3 2 1 3 3 3 1 3 3 0 0 2 1 0 1 0 3 3 0 2 1 2 1 3 3
 0 3 3 2 3 3 3 3 3 1 3 3 3 3 2 3 3]

log.txt

@sh1ng
Copy link
Contributor

sh1ng commented Jan 8, 2020

@StrikerRUS It's reproducible on edb9149 with boost updated to 1.72 on Linux

Selected GPU version of lightgbm to import

2.3.2
python: /root/repo/LightGBM/compute/include/boost/compute/utility/wait_list.hpp:166: void boost::compute::wait_list::wait() const: Assertion `clWaitForEvents(size(), get_event_ptr()) == 0' failed.
Aborted (core dumped)

You can download h2o4gpu the version of lgbm from https://h2o-release.s3.amazonaws.com/h2o4gpu/snapshots/ai/h2o/h2o4gpu/0.3-cuda10/h2o4gpu-0.3.2%2Bpr.818.e8ed15e-cp36-cp36m-linux_x86_64.whl

use

from h2o4gpu.util.lightgbm_dynamic import got_cpu_lgb, got_gpu_lgb

to import lgbm

@StrikerRUS
Copy link
Collaborator

@sh1ng Is hardware the same as in #2648 (comment)?

@StrikerRUS
Copy link
Collaborator

BTW, have you tried to play around with gpu_platform_id and gpu_device_id parameters along with disabling integrated graphics?

@sh1ng
Copy link
Contributor

sh1ng commented Jan 8, 2020

Operating System: Ubuntu 18.04.3 LTS

CPU/GPU model: Intel(R) Core(TM) i7-8550U CPU @ 1.80GHz/GeForce MX150

C++/Python/R version: Python 3.6.8

the whl was built on centos-7

The same if I install lightgbm from pypi with opencl support

pip install lightgbm --install-option=--gpu --install-option="--opencl-include-dir=/usr/local/cuda/include/" --install-option="--opencl-library=/usr/local/cuda/lib64/libOpenCL.so"

tested on cuda-10.0 and cuda-10.2

@sh1ng
Copy link
Contributor

sh1ng commented Jan 8, 2020

If I'm not mistaken there's only one OpenCL device

$ clinfo
Number of platforms                               1
  Platform Name                                   NVIDIA CUDA
  Platform Vendor                                 NVIDIA Corporation
  Platform Version                                OpenCL 1.2 CUDA 10.2.95
  Platform Profile                                FULL_PROFILE
  Platform Extensions                             cl_khr_global_int32_base_atomics cl_khr_global_int32_extended_atomics cl_khr_local_int32_base_atomics cl_khr_local_int32_extended_atomics cl_khr_fp64 cl_khr_byte_addressable_store cl_khr_icd cl_khr_gl_sharing cl_nv_compiler_options cl_nv_device_attribute_query cl_nv_pragma_unroll cl_nv_copy_opts cl_nv_create_buffer cl_khr_int64_base_atomics cl_khr_int64_extended_atomics
  Platform Extensions function suffix             NV

  Platform Name                                   NVIDIA CUDA
Number of devices                                 1
  Device Name                                     GeForce MX150
  Device Vendor                                   NVIDIA Corporation
  Device Vendor ID                                0x10de
  Device Version                                  OpenCL 1.2 CUDA
  Driver Version                                  440.33.01
  Device OpenCL C Version                         OpenCL C 1.2 
  Device Type                                     GPU
  Device Topology (NV)                            PCI-E, 01:00.0
  Device Profile                                  FULL_PROFILE
  Device Available                                Yes
  Compiler Available                              Yes
  Linker Available                                Yes
  Max compute units                               3
  Max clock frequency                             1037MHz
  Compute Capability (NV)                         6.1
  Device Partition                                (core)
    Max number of sub-devices                     1
    Supported partition types                     None
  Max work item dimensions                        3
  Max work item sizes                             1024x1024x64
  Max work group size                             1024
  Preferred work group size multiple              32
  Warp size (NV)                                  32
  Preferred / native vector sizes                 
    char                                                 1 / 1       
    short                                                1 / 1       
    int                                                  1 / 1       
    long                                                 1 / 1       
    half                                                 0 / 0        (n/a)
    float                                                1 / 1       
    double                                               1 / 1        (cl_khr_fp64)
  Half-precision Floating-point support           (n/a)
  Single-precision Floating-point support         (core)
    Denormals                                     Yes
    Infinity and NANs                             Yes
    Round to nearest                              Yes
    Round to zero                                 Yes
    Round to infinity                             Yes
    IEEE754-2008 fused multiply-add               Yes
    Support is emulated in software               No
    Correctly-rounded divide and sqrt operations  Yes
  Double-precision Floating-point support         (cl_khr_fp64)
    Denormals                                     Yes
    Infinity and NANs                             Yes
    Round to nearest                              Yes
    Round to zero                                 Yes
    Round to infinity                             Yes
    IEEE754-2008 fused multiply-add               Yes
    Support is emulated in software               No
  Address bits                                    64, Little-Endian
  Global memory size                              2099904512 (1.956GiB)
  Error Correction support                        No
  Max memory allocation                           524976128 (500.7MiB)
  Unified memory for Host and Device              No
  Integrated memory (NV)                          No
  Minimum alignment for any data type             128 bytes
  Alignment of base address                       4096 bits (512 bytes)
  Global Memory cache type                        Read/Write
  Global Memory cache size                        147456 (144KiB)
  Global Memory cache line size                   128 bytes
  Image support                                   Yes
    Max number of samplers per kernel             32
    Max size for 1D images from buffer            268435456 pixels
    Max 1D or 2D image array size                 2048 images
    Max 2D image size                             16384x32768 pixels
    Max 3D image size                             16384x16384x16384 pixels
    Max number of read image args                 256
    Max number of write image args                16
  Local memory type                               Local
  Local memory size                               49152 (48KiB)
  Registers per block (NV)                        65536
  Max number of constant args                     9
  Max constant buffer size                        65536 (64KiB)
  Max size of kernel argument                     4352 (4.25KiB)
  Queue properties                                
    Out-of-order execution                        Yes
    Profiling                                     Yes
  Prefer user sync for interop                    No
  Profiling timer resolution                      1000ns
  Execution capabilities                          
    Run OpenCL kernels                            Yes
    Run native kernels                            No
    Kernel execution timeout (NV)                 No
  Concurrent copy and kernel execution (NV)       Yes
    Number of async copy engines                  2
  printf() buffer size                            1048576 (1024KiB)
  Built-in kernels                                
  Device Extensions                               cl_khr_global_int32_base_atomics cl_khr_global_int32_extended_atomics cl_khr_local_int32_base_atomics cl_khr_local_int32_extended_atomics cl_khr_fp64 cl_khr_byte_addressable_store cl_khr_icd cl_khr_gl_sharing cl_nv_compiler_options cl_nv_device_attribute_query cl_nv_pragma_unroll cl_nv_copy_opts cl_nv_create_buffer cl_khr_int64_base_atomics cl_khr_int64_extended_atomics

NULL platform behavior
  clGetPlatformInfo(NULL, CL_PLATFORM_NAME, ...)  No platform
  clGetDeviceIDs(NULL, CL_DEVICE_TYPE_ALL, ...)   No platform
  clCreateContext(NULL, ...) [default]            No platform
  clCreateContext(NULL, ...) [other]              Success [NV]
  clCreateContextFromType(NULL, CL_DEVICE_TYPE_DEFAULT)  No platform
  clCreateContextFromType(NULL, CL_DEVICE_TYPE_CPU)  No devices found in platform
  clCreateContextFromType(NULL, CL_DEVICE_TYPE_GPU)  No platform
  clCreateContextFromType(NULL, CL_DEVICE_TYPE_ACCELERATOR)  No devices found in platform
  clCreateContextFromType(NULL, CL_DEVICE_TYPE_CUSTOM)  Invalid device type for platform
  clCreateContextFromType(NULL, CL_DEVICE_TYPE_ALL)  No platform

@sh1ng
Copy link
Contributor

sh1ng commented Jan 8, 2020

removing 'gpu_use_dp': True solves the issue
cc @pseudotensor

@StrikerRUS
Copy link
Collaborator

@sh1ng Thank you very much for the clinfo output and possible workaround! I think it will help in debugging the issue.

cc @huanzhang12

@sh1ng
Copy link
Contributor

sh1ng commented Jun 16, 2020

Recompiling with defined SCORE_T_USE_DOUBLE macro solves the issue https://github.com/microsoft/LightGBM/blob/master/include/LightGBM/meta.h#L30.

It means lib_lightgbm.so must contain both versions or gpu_use_dp=true parameter should be removed and compilation instruction adjusted accordingly.

cc @pseudotensor @arnocandel

@ChipKerchner
Copy link
Contributor

Please do NOT remove the gpu_use_dp flag. We are using it in the CUDA version.

@guolinke
Copy link
Collaborator

guolinke commented Jul 7, 2020

ping @huanzhang12 for the

Recompiling with defined SCORE_T_USE_DOUBLE macro solves the issue https://github.com/microsoft/LightGBM/blob/master/include/LightGBM/meta.h#L30.

It means lib_lightgbm.so must contain both versions or gpu_use_dp=true parameter should be removed and compilation instruction adjusted accordingly.

cc @pseudotensor @arnocandel

@skaae
Copy link

skaae commented Jan 6, 2023

We got the same error using:

lightgbm: 3.3.3.99
Ubuntu: Ubuntu 22.04.1 LTS
Python 3.10.6 (main, Aug 10 2022, 11:40:04) [GCC 11.3.0]
NVIDIA-SMI 515.65.01 Driver Version: 515.65.01 CUDA Version: 11.7

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

6 participants