gpu setup [second stage] #720

SteveBronder · 2018-01-13T02:57:33Z

Submission Checklist

Run unit tests: ./runTests.py test/unit
Run cpplint: make cpplint
Declare copyright holder and open-source license: see below

Summary:

Adds the OpenCL context to manage the GPU device, platforms, job queues, and kernels. Users can pass the device and platform ids through OPENCL_DEVICE_ID and OPENCL_PLATFORM_ID.

Users can find the appropriate device and platform id either through the clinfo program or by using the capabilities function

Sets up the Jenkins stages to run the GPU code.

The OpenCL context is built in the Meyer's singleton style with a friend class to act as the API for accessing the singleton. So users will access the global instance opencl_context which will then call the members of opencl_context_base::get_instance(). See the code here for an example. I'm not sure if there is a name for this style, but it is based on the magic card Mimic Vat so maybe that is an appropriate name.

Rok and I thought it would be better to break the GPU PR into several smaller PRs that will flow by:

OpenCL headers / makefile things
Code to set up GPU (You are here)
basic GPU matrix and operators
Multiplication and Inverse for GPU Matrix
Cholesky Decomposition

Intended Effect:

Allows for compilation of GPU kernels

How to Verify:

To test the context:

make test/unit/math/gpu/opencl_context_test STAN_OPENCL=true && ./test/unit/math/gpu/opencl_context_test

make test/unit/math/prim/arr/err/check_opencl_test STAN_OPENCL=true && ./test/unit/math/prim/arr/err/check_opencl_test

Side Effects:

None

Documentation:

https://github.com/stan-dev/math/wiki/OpenCL-GPU-Routines

Copyright and Licensing

Please list the copyright holder for the work you are submitting (this will be you or your assignee, such as a university or company):

By submitting this pull request, the copyright holder is agreeing to license the submitted work under the following licenses:

Code: BSD 3-clause (https://opensource.org/licenses/BSD-3-Clause)
Documentation: CC-BY 4.0 (https://creativecommons.org/licenses/by/4.0/)

OpenCL headers are copyrighted by The Khronos Group Inc.
Rok Češnovar and Erik Štrumbelj (Faculty of Computer and Information Science, University of Ljubljana)
Steve Bronder

SteveBronder · 2018-01-13T04:13:40Z

I'm seeing that this lead to a failure on jenkins

test/prob/chi_square/chi_square_cdf_log_00000_generated_fd_test failed

http://d1m1s1b1.stat.columbia.edu:8080/job/Math%20Pipeline/job/PR-720/6/consoleText

I don't touch the chi_square_cdf_etc test tho? It also compiles and runs fine on my local

I'm also seeing

cannot find -lOpenCL

in the console output
http://d1m1s1b1.stat.columbia.edu:8080/job/Math%20Pipeline/job/PR-720/6/warnings23Result/new/

Do I have to set up OpenCL on Jenkins like in Travis?

syclik · 2018-01-18T13:55:08Z

I'm not sure what error you're seeing out of Jenkins. I followed the link and clicked on logs. It looks like a linker error due to this pull request?

[Unit] clang++ -Werror -Wall -I . -isystem lib/eigen_3.3.3 -isystem lib/boost_1.65.1 -isystem lib/cvodes_2.9.0/include -isystem lib/opencl_1.2.8	 -std=c++1y -DBOOST_RESULT_OF_USE_TR1 -DBOOST_NO_DECLTYPE -DBOOST_DISABLE_ASSERTS -DBOOST_PHOENIX_NO_VARIADIC_EXPRESSION -Wno-unused-function -Wno-uninitialized -DSTAN_GPU -DSTAN_DEVICE_CPU -stdlib=libc++ -Wno-unknown-warning-option -Wno-tautological-compare -Wsign-compare -DGTEST_USE_OWN_TR1_TUPLE  -isystem lib/gtest_1.7.0/include -isystem lib/gtest_1.7.0 -O3 -DGTEST_USE_OWN_TR1_TUPLE  -isystem lib/gtest_1.7.0/include -isystem lib/gtest_1.7.0 -O3 -DGTEST_USE_OWN_TR1_TUPLE  -isystem lib/gtest_1.7.0/include -isystem lib/gtest_1.7.0 -O3 -DNO_FPRINTF_OUTPUT -pipe   -shared -fPIC -o test/unit/libmultiple.so test/unit/multiple_translation_units1.o test/unit/multiple_translation_units2.o
[Unit] Undefined symbols for architecture x86_64:
[Unit]   "_clReleaseCommandQueue", referenced from:
[Unit]       cl::CommandQueue::~CommandQueue() in multiple_translation_units1.o
[Unit]       stan::math::ocl::~ocl() in multiple_translation_units1.o
[Unit]       cl::CommandQueue::~CommandQueue() in multiple_translation_units2.o
[Unit]       stan::math::ocl::~ocl() in multiple_translation_units2.o
[Unit]   "_clReleaseContext", referenced from:
[Unit]       cl::Context::~Context() in multiple_translation_units1.o
[Unit]       stan::math::ocl::~ocl() in multiple_translation_units1.o
[Unit]       cl::Context::~Context() in multiple_translation_units2.o
[Unit]       stan::math::ocl::~ocl() in multiple_translation_units2.o
[Unit]   "_clReleaseDevice", referenced from:
[Unit]       stan::math::ocl::~ocl() in multiple_translation_units1.o
[Unit]       stan::math::ocl::~ocl() in multiple_translation_units2.o
[Unit]   "_clReleaseKernel", referenced from:
[Unit]       std::__1::__tree<std::__1::__value_type<std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char> >, cl::Kernel>, std::__1::__map_value_compare<std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char> >, std::__1::__value_type<std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char> >, cl::Kernel>, std::__1::less<std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char> > >, true>, std::__1::allocator<std::__1::__value_type<std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char> >, cl::Kernel> > >::destroy(std::__1::__tree_node<std::__1::__value_type<std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char> >, cl::Kernel>, void*>*) in multiple_translation_units1.o
[Unit]       std::__1::__tree<std::__1::__value_type<std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char> >, cl::Kernel>, std::__1::__map_value_compare<std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char> >, std::__1::__value_type<std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char> >, cl::Kernel>, std::__1::less<std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char> > >, true>, std::__1::allocator<std::__1::__value_type<std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char> >, cl::Kernel> > >::destroy(std::__1::__tree_node<std::__1::__value_type<std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char> >, cl::Kernel>, void*>*) in multiple_translation_units2.o
[Unit] ld: symbol(s) not found for architecture x86_64
[Unit] clang: error: linker command failed with exit code 1 (use -v to see invocation)
[Unit] clang++ -Werror -Wall -I . -isystem lib/eigen_3.3.3 -isystem lib/boost_1.65.1 -isystem lib/cvodes_2.9.0/include -isystem lib/opencl_1.2.8	 -std=c++1y -DBOOST_RESULT_OF_USE_TR1 -DBOOST_NO_DECLTYPE -DBOOST_DISABLE_ASSERTS -DBOOST_PHOENIX_NO_VARIADIC_EXPRESSION -Wno-unused-function -Wno-uninitialized -DSTAN_GPU -DSTAN_DEVICE_CPU -stdlib=libc++ -Wno-unknown-warning-option -Wno-tautological-compare -Wsign-compare -DGTEST_USE_OWN_TR1_TUPLE  -isystem lib/gtest_1.7.0/include -isystem lib/gtest_1.7.0 -O3 -DGTEST_USE_OWN_TR1_TUPLE  -isystem lib/gtest_1.7.0/include -isystem lib/gtest_1.7.0 -O3 -DGTEST_USE_OWN_TR1_TUPLE  -isystem lib/gtest_1.7.0/include -isystem lib/gtest_1.7.0 -O3 -DNO_FPRINTF_OUTPUT -pipe   -shared -fPIC -o test/unit/libmultiple.so test/unit/multiple_translation_units1.o test/unit/multiple_translation_units2.o
[Unit] Undefined symbols for architecture x86_64:
[Unit]   "_clReleaseCommandQueue", referenced from:
[Unit]       cl::CommandQueue::~CommandQueue() in multiple_translation_units1.o
[Unit]       stan::math::ocl::~ocl() in multiple_translation_units1.o
[Unit]       cl::CommandQueue::~CommandQueue() in multiple_translation_units2.o
[Unit]       stan::math::ocl::~ocl() in multiple_translation_units2.o
[Unit]   "_clReleaseContext", referenced from:
[Unit]       cl::Context::~Context() in multiple_translation_units1.o
[Unit]       stan::math::ocl::~ocl() in multiple_translation_units1.o
[Unit]       cl::Context::~Context() in multiple_translation_units2.o
[Unit]       stan::math::ocl::~ocl() in multiple_translation_units2.o
[Unit]   "_clReleaseDevice", referenced from:
[Unit]       stan::math::ocl::~ocl() in multiple_translation_units1.o
[Unit]       stan::math::ocl::~ocl() in multiple_translation_units2.o
[Unit]   "_clReleaseKernel", referenced from:
[Unit]       std::__1::__tree<std::__1::__value_type<std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char> >, cl::Kernel>, std::__1::__map_value_compare<std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char> >, std::__1::__value_type<std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char> >, cl::Kernel>, std::__1::less<std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char> > >, true>, std::__1::allocator<std::__1::__value_type<std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char> >, cl::Kernel> > >::destroy(std::__1::__tree_node<std::__1::__value_type<std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char> >, cl::Kernel>, void*>*) in multiple_translation_units1.o
[Unit]       std::__1::__tree<std::__1::__value_type<std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char> >, cl::Kernel>, std::__1::__map_value_compare<std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char> >, std::__1::__value_type<std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char> >, cl::Kernel>, std::__1::less<std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char> > >, true>, std::__1::allocator<std::__1::__value_type<std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char> >, cl::Kernel> > >::destroy(std::__1::__tree_node<std::__1::__value_type<std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char> >, cl::Kernel>, void*>*) in multiple_translation_units2.o
[Unit] ld: symbol(s) not found for architecture x86_64
[Unit] clang: error: linker command failed with exit code 1 (use -v to see invocation)
[Unit] clang++ -Werror  -DNO_FPRINTF_OUTPUT -pipe  -c -x c -O3 -isystem lib/cvodes_2.9.0/include lib/cvodes_2.9.0/src/nvec_ser/nvector_serial.c -o lib/cvodes_2.9.0/src/nvec_ser/nvector_serial.o
[Unit] clang++ -Werror  -DNO_FPRINTF_OUTPUT -pipe  -c -x c -O3 -isystem lib/cvodes_2.9.0/include lib/cvodes_2.9.0/src/sundials/sundials_math.c -o lib/cvodes_2.9.0/src/sundials/sundials_math.o
[Unit] clang++ -Werror  -DNO_FPRINTF_OUTPUT -pipe  -c -x c -O3 -isystem lib/cvodes_2.9.0/include lib/cvodes_2.9.0/src/cvodes/cvodes.c -o lib/cvodes_2.9.0/src/cvodes/cvodes.o
[Unit] clang++ -Werror  -DNO_FPRINTF_OUTPUT -pipe  -c -x c -O3 -isystem lib/cvodes_2.9.0/include lib/cvodes_2.9.0/src/cvodes/cvodes_io.c -o lib/cvodes_2.9.0/src/cvodes/cvodes_io.o
[Unit] clang++ -Werror  -DNO_FPRINTF_OUTPUT -pipe  -c -x c -O3 -isystem lib/cvodes_2.9.0/include lib/cvodes_2.9.0/src/cvodes/cvodea.c -o lib/cvodes_2.9.0/src/cvodes/cvodea.o
[Unit] clang++ -Werror  -DNO_FPRINTF_OUTPUT -pipe  -c -x c -O3 -isystem lib/cvodes_2.9.0/include lib/cvodes_2.9.0/src/cvodes/cvodea_io.c -o lib/cvodes_2.9.0/src/cvodes/cvodea_io.o
[Unit] clang++ -Werror  -DNO_FPRINTF_OUTPUT -pipe  -c -x c -O3 -isystem lib/cvodes_2.9.0/include lib/cvodes_2.9.0/src/cvodes/cvodes_direct.c -o lib/cvodes_2.9.0/src/cvodes/cvodes_direct.o
[Headers] clang++ -Werror -Wall -I . -isystem lib/eigen_3.3.3 -isystem lib/boost_1.65.1 -isystem lib/cvodes_2.9.0/include -isystem lib/opencl_1.2.8	 -std=c++1y -DBOOST_RESULT_OF_USE_TR1 -DBOOST_NO_DECLTYPE -DBOOST_DISABLE_ASSERTS -DBOOST_PHOENIX_NO_VARIADIC_EXPRESSION -Wno-unused-function -Wno-uninitialized -DSTAN_GPU -DSTAN_DEVICE_CPU -stdlib=libc++ -Wno-unknown-warning-option -Wno-tautological-compare -Wsign-compare -DNO_FPRINTF_OUTPUT -pipe  -c -O0 -include stan/math/rev/scal/fun/inv_logit.hpp test/dummy.cpp -o /dev/null

SteveBronder · 2018-01-18T15:07:47Z

I think this is because of

  std::string description_;
  cl::Context oclContext_;
  cl::CommandQueue oclQueue_;
  cl::Platform oclPlatform_;
  cl::Device oclDevice_;

(here)

which I think I can make static and clear up those error. A similar error was causing failures in multiple_translation_units before. Ty for checking it out! I'll look at it more when I get home from work

EDIT: No actually it's that the multiple translation units test is not getting the -framework OpenCL flag stackoverflow link

syclik · 2018-01-18T15:19:29Z

Ok. That makes sense. Looks like you got it going, at least for Travis. I'm not sure what's happening with Jenkins yet.

SteveBronder · 2018-01-20T04:43:18Z

This is large because the GPU kernels are also included in this commit. What will the review process be for the kernels? Should I include them in a separate PR?

SteveBronder · 2018-01-20T05:05:54Z

make/default_compiler_options

@@ -12,8 +12,18 @@ AR = ar
 # CPPFLAGS are used for both C and C++ compilation
 CPPFLAGS = -DNO_FPRINTF_OUTPUT -pipe
 # CXXFLAGS are just used for C++
-CXXFLAGS = -Wall -I . -isystem $(EIGEN) -isystem $(BOOST) -isystem $(CVODES)/include -std=c++1y -DBOOST_RESULT_OF_USE_TR1 -DBOOST_NO_DECLTYPE -DBOOST_DISABLE_ASSERTS -DBOOST_PHOENIX_NO_VARIADIC_EXPRESSION -Wno-unused-function -Wno-uninitialized
-GTEST_CXXFLAGS = -DGTEST_USE_OWN_TR1_TUPLE
+CXXFLAGS = -Wall -I . -isystem $(EIGEN) -isystem $(BOOST) -isystem $(CVODES)/include -isystem $(OPENCL) -std=c++1y -DBOOST_RESULT_OF_USE_TR1 -DBOOST_NO_DECLTYPE -DBOOST_DISABLE_ASSERTS -DBOOST_PHOENIX_NO_VARIADIC_EXPRESSION -Wno-unused-function -Wno-uninitialized -DSTAN_GPU -DSTAN_DEVICE_CPU


@seantalts et al.

For removing sudo in the jenkins file, can I add something for CXXFLAGS like

ifeq (${TRAVIS},true) CXXFLAGS+=-L${HOME}/AMDAPPSDK/lib/x86_64/ endif

SteveBronder · 2018-01-20T18:33:11Z

make/default_compiler_options

+  ifeq ($(shell echo $(TRAVIS_OS_NAME)),linux)
+    POST_LDLIBS+=-L/home/travis/AMDAPPSDK/lib/x86_64/
+  endif
+endif


I added this which is working on travis without sudo

SteveBronder · 2018-01-20T18:34:35Z

make/default_compiler_options

+  else
+    POST_LDLIBS+=-lOpenCL
+  endif
+endif


Relevant side note: I have no idea how to compile this for windows. Does Jenkins test on Windows?

Jenkins does test on Windows.

SteveBronder · 2018-01-20T18:36:13Z

stan/math/prim/mat/fun/ocl_gpu.hpp

@@ -0,0 +1,296 @@
+#ifndef STAN_MATH_PRIM_MAT_FUN_OCL_HPP


I was not really sure where to put this file. It's purpose is to set up the gpu backend. Should this be in some more internal folder?

let me think about it. We have the autodiff stack sitting in stan/math/memory/stack_alloc.hpp.

SteveBronder · 2018-01-20T18:40:35Z

stan/math/prim/mat/fun/ocl_gpu.hpp

+      ;                                                    // NOLINT
+  kernel_strings["basic_matrix"] =
+#include <stan/math/prim/mat/kern_gpu/basic_matrix.cl>  // NOLINT
+      ;                                                 // NOLINT


These #includes are here because when I was trying to read the kernels in with ifstream I was not guaranteed that they would be loaded by the time this map was called to compile the kernel. (Some sort of call before initialization error?)

If there is a prettier / better way to do this I am for it

Can you point me to a reference to the kernels? They're just strings that are stored?

SteveBronder · 2018-01-20T18:42:27Z

stan/math/prim/mat/kern_gpu/basic_matrix.cl

+}
+
+
+)====="


If all of these kernels makes the review too bulky I can remove them or just take a subset

SteveBronder · 2018-01-20T19:24:33Z

from here

g++-4.9 -Werror -Wall -I . -isystem lib/eigen_3.3.3 -isystem lib/boost_1.65.1 -isystem lib/cvodes_2.9.0/include -isystem lib/opencl_1.2.8	 -std=c++1y -DBOOST_RESULT_OF_USE_TR1 -DBOOST_NO_DECLTYPE -DBOOST_DISABLE_ASSERTS -DBOOST_PHOENIX_NO_VARIADIC_EXPRESSION -Wno-unused-function -Wno-uninitialized -DSTAN_GPU -DSTAN_DEVICE_CPU -DNO_FPRINTF_OUTPUT -pipe  -c -O0 -include stan/math/prim/scal/prob/chi_square_log.hpp test/dummy.cpp -o /dev/null
g++-4.9: internal compiler error: Terminated (program cc1plus)
Please submit a full bug report,

Que? Is there someway to restart a part of the overall testing like the jobs in travis?

EDIT: Jenkins is not throwing this error anymore

syclik · 2018-01-22T13:54:23Z

make/default_compiler_options

@@ -12,8 +12,24 @@ AR = ar
 # CPPFLAGS are used for both C and C++ compilation
 CPPFLAGS = -DNO_FPRINTF_OUTPUT -pipe
 # CXXFLAGS are just used for C++
-CXXFLAGS = -Wall -I . -isystem $(EIGEN) -isystem $(BOOST) -isystem $(CVODES)/include -std=c++1y -DBOOST_RESULT_OF_USE_TR1 -DBOOST_NO_DECLTYPE -DBOOST_DISABLE_ASSERTS -DBOOST_PHOENIX_NO_VARIADIC_EXPRESSION -Wno-unused-function -Wno-uninitialized
+CXXFLAGS = -Wall -I . -isystem $(EIGEN) -isystem $(BOOST) -isystem $(CVODES)/include -isystem $(OPENCL) -std=c++1y -DBOOST_RESULT_OF_USE_TR1 -DBOOST_NO_DECLTYPE -DBOOST_DISABLE_ASSERTS -DBOOST_PHOENIX_NO_VARIADIC_EXPRESSION -Wno-unused-function -Wno-uninitialized -DSTAN_GPU -DSTAN_DEVICE_CPU


We'll need to change this. This forces STAN_GPU to be defined for all possible uses.

What's STAN_DEVICE_CPU? That looks new?

syclik · 2018-01-22T13:55:07Z

make/default_compiler_options

+  else
+    POST_LDLIBS+=-lOpenCL
+  endif
+endif


Jenkins does test on Windows.

syclik · 2018-01-22T13:55:22Z

make/tests

@@ -13,7 +13,7 @@ $(GTEST)/src/gtest-all.o: CXXFLAGS += $(GTEST_CXXFLAGS)

 test/%$(EXE) : CXXFLAGS += $(GTEST_CXXFLAGS)
 test/%$(EXE) : test/%.o $(GTEST_MAIN) $(GTEST)/src/gtest-all.o
-	$(LINK.cpp) -o $@ $^
+	$(LINK.cpp) -o $@ $^ $(POST_LDLIBS)


We'll fix this.

syclik · 2018-01-22T13:57:17Z

What's with the two macros now?

STAN_GPU
STAN_DEVICE_CPU

Are they doc'ed anywhere?

syclik · 2018-01-22T14:18:27Z

@SteveBronder, we're going to update the make portion of this.

So you know what's coming up, what we're going to do is define a make variable. When that's set, we will add the compiler flags to CXXFLAGS including the includes and we'll add linker flags to LDFLAGS. We're going to remove the direct addition of the compiler flags to CXXFLAGS and we're going to remove the POST_LDLIBS variable.

syclik · 2018-01-22T14:37:57Z

@SteveBronder, I just updated the PR.

@seantalts, I could use some guidance on how to do the travis thing right. Do we want to test everything with and without GPU?

SteveBronder · 2018-01-22T14:52:05Z

What's with the two macros now?

STAN_GPU

STAN_GPU tell the compiler to include the OpenCL files. Maybe this should be STAN_OPENCL

STAN_DEVICE_CPU

This tells the compiler to compile the OpenCL code for the CPU instead of the GPU so we can run on travis.

SteveBronder · 2018-01-22T14:53:23Z

So you know what's coming up, what we're going to do is define a make variable. When that's set, we will add the compiler flags to CXXFLAGS including the includes and we'll add linker flags to LDFLAGS. We're going to remove the direct addition of the compiler flags to CXXFLAGS and we're going to remove the POST_LDLIBS variable.

I'm cool with this

syclik · 2018-01-22T14:57:03Z

Sorry, can you take a step back and describe the two macros? I don't get it. If there are two defined macros, that's 4 configurations. Under what conditions are the four different conditions used?

seantalts · 2018-01-22T15:03:57Z

@syclik I think we should do GPU tests on Jenkins only. I doubt travis even supports GPUs, but we wouldn't want to rely on it there anyway.

syclik · 2018-01-22T15:05:54Z

@seantalts, that sounds good. We'll test both CPU and GPU on Jenkins.

…stable/2017-11-14)

SteveBronder · 2018-04-20T13:30:29Z

stan/math/gpu/opencl_context.hpp

+              device.getInfo<CL_DEVICE_AVAILABLE>() << "\n";
+        }
+      } catch (const cl::Error& e) {
+        msg << "\tno GPU devices in the specified platform" << "\n";


Nice catch! Instead of having a message here could we make a call to check_opencl_error()? I think it would throw a -1001: "CL_PLATFORM_NOT_FOUND_KHR or if not we can figure out what error code it returns and setup an error code for it.

Should we throw at all since having additional platforms without GPUs is not a problem and could happen?

We are just outputing the information here.

This is only a problem if the user selecta the platform that has no GPU devices. But that is handled elsewhere and does throw an exception, as it should :)

I think if we can throw a more informative error message then "C++ exception with description "clGetPlatformIDs" than it's a good idea to catch and handle it

…stable/2017-11-14)

rok-cesnovar · 2018-04-21T13:11:27Z

stan/math/gpu/opencl_context.hpp

+      if (OPENCL_PLATFORM_ID >= platforms_.size()) {
+        system_error("OpenCL Initialization", "[Platform]", -1,
+                       "CL_INVALID_PLATFORM");
+      }


If a user specified an ID that did not exist, the test produced a seg fault. We now output the following:

[----------] Global test environment set-up. [----------] 3 tests from MathGpu [ RUN ] MathGpu.getInfo unknown file: Failure C++ exception with description "OpenCL Initialization: [Platform] CL_INVALID_PLATFORM: Unknown error -1" thrown in the test body. [ FAILED ] MathGpu.getInfo (29 ms)

rok-cesnovar · 2018-04-21T13:12:10Z

stan/math/gpu/opencl_context.hpp

+      if (OPENCL_DEVICE_ID >= devices_.size()) {
+        system_error("OpenCL Initialization", "[Device]", -1,
+                       "CL_INVALID_DEVICE");
+      }


If a user specified an ID that did not exist, the test produced a seg fault. We now output the following:

[ RUN ] MathGpu.getInfo
unknown file: Failure
C++ exception with description "opencl_context: clGetDeviceIDs CL_DEVICE_NOT_FOUND: Unknown error -1" thrown in the test body.
[ FAILED ] MathGpu.getInfo (31 ms)

rok-cesnovar · 2018-04-21T13:13:53Z

stan/math/gpu/opencl_context.hpp

-        msg << "\tno GPU devices in the specified platform"
-            << "\n";
+        msg << "\tno GPU devices in the platform with ID "
+              << platform_id << "\n";


This is an example output listing all the platforms on a system. The first 2 have no GPU devices and the last one has 1 GPU device.

Number of Platforms: 3 Platform ID: 0 Platform Name: Experimental OpenCL 2.1 CPU Only Platform Platform Vendor: Intel(R) Corporation no GPU devices in the platform with ID 1 Platform ID: 1 Platform Name: Intel(R) OpenCL Platform Vendor: Intel(R) Corporation no GPU devices in the platform with ID 2 Platform ID: 2 Platform Name: NVIDIA CUDA Platform Vendor: NVIDIA Corporation Device 0: Device Name: Graphics Device Device Type: 4 Device Vendor: NVIDIA Corporation Device Max Compute Units: 30 Device Global Memory: 12780699648 Device Max Clock Frequency: 1582 Device Max Allocateable Memory: 3195174912 Device Local Memory: 49152 Device Available: 1

rok-cesnovar · 2018-04-21T13:35:20Z

I made a few changes on how we handle if a user specifies a platform ID that does not exist. I made review comments for the changes.

I also tested what happens if we try this on a system that has no OpenCL platforms but has the OpenCL library. It produces the following output

Running main() from gtest_main.cc
[==========] Running 6 tests from 2 test cases.
[----------] Global test environment set-up.
[----------] 3 tests from MathGpu
[ RUN      ] MathGpu.getInfo
unknown file: Failure
C++ exception with description "opencl_context: clGetPlatformIDs CL_PLATFORM_NOT_FOUND_KHR: Unknown error -1001" thrown in the test body.
[  FAILED  ] MathGpu.getInfo (0 ms)
[ RUN      ] MathGpu.context_construction
unknown file: Failure
C++ exception with description "opencl_context: clGetPlatformIDs CL_PLATFORM_NOT_FOUND_KHR: Unknown error -1001" thrown in the test body.
[  FAILED  ] MathGpu.context_construction (0 ms)
[ RUN      ] MathGpu.kernel_construction
test/unit/math/gpu/opencl_context_test.cpp:22: Failure
Expected: stan::math::opencl_context.get_kernel("dummy") doesn't throw an exception.
  Actual: it throws.
test/unit/math/gpu/opencl_context_test.cpp:23: Failure
Expected: stan::math::opencl_context.get_kernel("dummy2") doesn't throw an exception.
  Actual: it throws.
[  FAILED  ] MathGpu.kernel_construction (1 ms)
[----------] 3 tests from MathGpu (1 ms total)

[----------] 3 tests from opencl_context
[ RUN      ] opencl_context.platform
unknown file: Failure
C++ exception with description "clGetPlatformIDs" thrown in the test body.
[  FAILED  ] opencl_context.platform (0 ms)
[ RUN      ] opencl_context.devices
unknown file: Failure
C++ exception with description "listing_devices_test: clGetPlatformIDs CL_PLATFORM_NOT_FOUND_KHR: Unknown error -1001" thrown in the test body.
[  FAILED  ] opencl_context.devices (0 ms)
[ RUN      ] opencl_context.compile_kernel_rawcode
unknown file: Failure
C++ exception with description "opencl_context: clGetPlatformIDs CL_PLATFORM_NOT_FOUND_KHR: Unknown error -1001" thrown in the test body.
[  FAILED  ] opencl_context.compile_kernel_rawcode (0 ms)
[----------] 3 tests from opencl_context (0 ms total)

[----------] Global test environment tear-down
[==========] 6 tests from 2 test cases ran. (1 ms total)
[  PASSED  ] 0 tests.
[  FAILED  ] 6 tests, listed below:
[  FAILED  ] MathGpu.getInfo
[  FAILED  ] MathGpu.context_construction
[  FAILED  ] MathGpu.kernel_construction
[  FAILED  ] opencl_context.platform
[  FAILED  ] opencl_context.devices
[  FAILED  ] opencl_context.compile_kernel_rawcode

 6 FAILED TESTS

rok-cesnovar · 2018-04-21T18:04:38Z

On a side note, I played around with running these tests on Windows with an NVIDIA GPU. Getting to compile with OpenCL ( I used a toolchain based on gcc and mingw-w64 via RTools) was surprisingly smooth.

There is however again the problem with the -lOpenCL flag, that I needed to move behind the -o
My quick and dirty solution was to modify the make/tests adding the LDFLAGS_OPENCL:

test/%$(EXE) : test/%.o $(GTEST_MAIN) $(GTEST)/src/gtest-all.o
	$(LINK.cpp) -o $@ $^ $(LDFLAGS_OPENCL)

Is there any less dirty way of doing this?

SteveBronder · 2018-04-24T18:05:03Z

@seantalts can you help Ben setup the GPU stuff on the linux box? I can also stop by Friday afternoon to help get things set up if you are available Ben

bgoodri · 2018-04-25T03:53:45Z

Tests passing on my Debian Stable laptop if I use g++ (not sure what combination of things actually were necessary to fix it), but it still crashes if I use clang++. And the server still crashes if I use clang++ but has a linker error if I use g++.

SteveBronder · 2018-04-25T05:07:09Z

For the clang++ error did you try removing the gtests.so file? I have to do that when I switch between compilers. I assumed that was a Stan thing but if it's a GPU thing then I can make a PR `make clean-all` clean the gtest.so out as well. Is the linker error the same one you received before?

…

On Tue, Apr 24, 2018, 11:53 PM bgoodri ***@***.***> wrote: Tests passing on my Debian Stable laptop if I use g++ (not sure what combination of things actually were necessary to fix it), but it still crashes if I use clang++. And the server still crashes if I use clang++ but has a linker error if I use g++. — You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub <#720 (comment)>, or mute the thread <https://github.com/notifications/unsubscribe-auth/AFlfz2APlaQK5EW8hwLz5vJ5C3d5GrCKks5tr_NKgaJpZM4RdE70> .

…xp1~20180406212027.91 (branches/release_50)

rok-cesnovar · 2018-04-25T11:37:26Z

stan/math/gpu/opencl_context.hpp

@@ -194,7 +194,7 @@ class opencl_context {
      // and mark them as compiled.
      for (auto kern : kernel_info()) {
        if (strcmp(kern.second.group, kernel_group) == 0) {
-          kernels()[(kern.first)] = cl::Kernel(program_, kern.first, &err);
+          opencl_context_base::getInstance().kernels[(kern.first)] = cl::Kernel(program_, kern.first, &err);


I had to change this, as I was getting the "CL_INVALID_KERNEL" when using the kernel (retrieved with get_kernel()). Retrieving the name returned an empty string. I added a test to check this.

this is necessary to get g++ to work

SteveBronder · 2018-04-30T14:28:28Z

With Ben's PR above I think this is all good to go?

…pu_setup

syclik · 2018-05-01T12:23:27Z

Yeah, looks great! Thanks @SteveBronder, @rok-cesnovar, and @bgoodri for working out the practical issues of GPU!

SteveBronder changed the title ~~Gpu setup~~ gpu setup Jan 13, 2018

SteveBronder force-pushed the gpu_setup branch from 26b7cbb to 8df83d0 Compare January 18, 2018 03:27

SteveBronder force-pushed the gpu_setup branch from a4f32f6 to 65d755f Compare January 20, 2018 04:30

SteveBronder changed the title ~~gpu setup~~ gpu setup [second stage] Jan 20, 2018

SteveBronder requested a review from syclik January 20, 2018 04:42

SteveBronder force-pushed the gpu_setup branch from 3cfb5d6 to b24d239 Compare January 20, 2018 04:58

SteveBronder commented Jan 20, 2018

View reviewed changes

SteveBronder force-pushed the gpu_setup branch 3 times, most recently from 983887e to 8e3a36b Compare January 21, 2018 00:17

syclik reviewed Jan 22, 2018

View reviewed changes

yashikno and others added 2 commits April 20, 2018 05:38

Merge commit 'e6e2adb13b167838ae14e64e60dc5ebdfeed0eaf' into HEAD

1ad9676

[Jenkins] auto-formatting by clang-format version 6.0.0 (tags/google/…

b51161a

…stable/2017-11-14)

SteveBronder commented Apr 20, 2018

View reviewed changes

rok-cesnovar and others added 3 commits April 21, 2018 15:02

handling cases of wrong platform/device ID

697070e

Merge commit 'f33c02e7c22c0125c0be510db9d8e681abb02f65' into HEAD

b0b8f97

[Jenkins] auto-formatting by clang-format version 6.0.0 (tags/google/…

bb1aad0

…stable/2017-11-14)

rok-cesnovar reviewed Apr 21, 2018

View reviewed changes

rok-cesnovar added 2 commits April 21, 2018 15:42

handling other errors in capabilities()

4323300

cpplint cleanup

c5e365e

rok-cesnovar and others added 3 commits April 25, 2018 13:29

storing kernels on compile to .kernels instead of kernels(), added tests

a25d9c6

Merge commit '0665998b10607511712267bcb90c89fdd8cd7087' into HEAD

312a84a

[Jenkins] auto-formatting by clang-format version 5.0.2-svn328729-1~e…

672fffa

…xp1~20180406212027.91 (branches/release_50)

rok-cesnovar reviewed Apr 25, 2018

View reviewed changes

Ben Goodrich and others added 2 commits April 29, 2018 14:51

put OPENCL linker flags last

6c506e7

this is necessary to get g++ to work

remove binary object from lib/gtest_*/src

77774a0

bgoodri mentioned this pull request Apr 29, 2018

Rok cesnovar gpu setup #849

Merged

3 tasks

Merge remote-tracking branch 'upstream/rok-cesnovar-gpu_setup' into g…

3b00f74

…pu_setup

syclik approved these changes May 1, 2018

View reviewed changes

syclik merged commit b1e394a into stan-dev:develop May 1, 2018

rok-cesnovar deleted the gpu_setup branch May 2, 2018 07:26

rok-cesnovar restored the gpu_setup branch May 2, 2018 07:27

		}


		)====="

gpu setup [second stage] #720

gpu setup [second stage] #720

Conversation

SteveBronder commented Jan 13, 2018 • edited by seantalts Loading

Submission Checklist

Summary:

Intended Effect:

How to Verify:

Side Effects:

Documentation:

Copyright and Licensing

SteveBronder commented Jan 13, 2018 • edited Loading

syclik commented Jan 18, 2018

SteveBronder commented Jan 18, 2018 • edited Loading

syclik commented Jan 18, 2018

SteveBronder commented Jan 20, 2018

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

SteveBronder Jan 20, 2018 • edited Loading

Choose a reason for hiding this comment

SteveBronder commented Jan 20, 2018 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

syclik commented Jan 22, 2018

syclik commented Jan 22, 2018

syclik commented Jan 22, 2018

SteveBronder commented Jan 22, 2018

SteveBronder commented Jan 22, 2018

syclik commented Jan 22, 2018 • edited Loading

seantalts commented Jan 22, 2018

syclik commented Jan 22, 2018

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

rok-cesnovar Apr 21, 2018 • edited Loading

Choose a reason for hiding this comment

rok-cesnovar commented Apr 21, 2018 • edited Loading

rok-cesnovar commented Apr 21, 2018

SteveBronder commented Apr 24, 2018

bgoodri commented Apr 25, 2018

SteveBronder commented Apr 25, 2018 via email

Choose a reason for hiding this comment

SteveBronder commented Apr 30, 2018

syclik commented May 1, 2018

SteveBronder commented Jan 13, 2018 •

edited by seantalts

Loading

SteveBronder commented Jan 13, 2018 •

edited

Loading

SteveBronder commented Jan 18, 2018 •

edited

Loading

SteveBronder Jan 20, 2018 •

edited

Loading

SteveBronder commented Jan 20, 2018 •

edited

Loading

syclik commented Jan 22, 2018 •

edited

Loading

rok-cesnovar Apr 21, 2018 •

edited

Loading

rok-cesnovar commented Apr 21, 2018 •

edited

Loading