Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

gpu setup [second stage] #720

Merged
merged 169 commits into from
May 1, 2018
Merged

gpu setup [second stage] #720

merged 169 commits into from
May 1, 2018

Conversation

SteveBronder
Copy link
Collaborator

@SteveBronder SteveBronder commented Jan 13, 2018

Submission Checklist

  • Run unit tests: ./runTests.py test/unit
  • Run cpplint: make cpplint
  • Declare copyright holder and open-source license: see below

Summary:

  1. Adds the OpenCL context to manage the GPU device, platforms, job queues, and kernels. Users can pass the device and platform ids through OPENCL_DEVICE_ID and OPENCL_PLATFORM_ID.
  • Users can find the appropriate device and platform id either through the clinfo program or by using the capabilities function
  1. Sets up the Jenkins stages to run the GPU code.

The OpenCL context is built in the Meyer's singleton style with a friend class to act as the API for accessing the singleton. So users will access the global instance opencl_context which will then call the members of opencl_context_base::get_instance(). See the code here for an example. I'm not sure if there is a name for this style, but it is based on the magic card Mimic Vat so maybe that is an appropriate name.


Rok and I thought it would be better to break the GPU PR into several smaller PRs that will flow by:

  1. OpenCL headers / makefile things
  2. Code to set up GPU (You are here)
  3. basic GPU matrix and operators
  4. Multiplication and Inverse for GPU Matrix
  5. Cholesky Decomposition

Intended Effect:

Allows for compilation of GPU kernels

How to Verify:

To test the context:

make test/unit/math/gpu/opencl_context_test STAN_OPENCL=true && ./test/unit/math/gpu/opencl_context_test

make test/unit/math/prim/arr/err/check_opencl_test STAN_OPENCL=true && ./test/unit/math/prim/arr/err/check_opencl_test

Side Effects:

None

Documentation:

https://github.com/stan-dev/math/wiki/OpenCL-GPU-Routines

Copyright and Licensing

Please list the copyright holder for the work you are submitting (this will be you or your assignee, such as a university or company):

By submitting this pull request, the copyright holder is agreeing to license the submitted work under the following licenses:

OpenCL headers are copyrighted by The Khronos Group Inc.
Rok Češnovar and Erik Štrumbelj (Faculty of Computer and Information Science, University of Ljubljana)
Steve Bronder

@SteveBronder SteveBronder changed the title Gpu setup gpu setup Jan 13, 2018
@SteveBronder
Copy link
Collaborator Author

SteveBronder commented Jan 13, 2018

I'm seeing that this lead to a failure on jenkins

test/prob/chi_square/chi_square_cdf_log_00000_generated_fd_test failed

http://d1m1s1b1.stat.columbia.edu:8080/job/Math%20Pipeline/job/PR-720/6/consoleText

I don't touch the chi_square_cdf_etc test tho? It also compiles and runs fine on my local

I'm also seeing

cannot find -lOpenCL

in the console output
http://d1m1s1b1.stat.columbia.edu:8080/job/Math%20Pipeline/job/PR-720/6/warnings23Result/new/

Do I have to set up OpenCL on Jenkins like in Travis?

@syclik
Copy link
Member

syclik commented Jan 18, 2018

I'm not sure what error you're seeing out of Jenkins. I followed the link and clicked on logs. It looks like a linker error due to this pull request?

[Unit] clang++ -Werror -Wall -I . -isystem lib/eigen_3.3.3 -isystem lib/boost_1.65.1 -isystem lib/cvodes_2.9.0/include -isystem lib/opencl_1.2.8	 -std=c++1y -DBOOST_RESULT_OF_USE_TR1 -DBOOST_NO_DECLTYPE -DBOOST_DISABLE_ASSERTS -DBOOST_PHOENIX_NO_VARIADIC_EXPRESSION -Wno-unused-function -Wno-uninitialized -DSTAN_GPU -DSTAN_DEVICE_CPU -stdlib=libc++ -Wno-unknown-warning-option -Wno-tautological-compare -Wsign-compare -DGTEST_USE_OWN_TR1_TUPLE  -isystem lib/gtest_1.7.0/include -isystem lib/gtest_1.7.0 -O3 -DGTEST_USE_OWN_TR1_TUPLE  -isystem lib/gtest_1.7.0/include -isystem lib/gtest_1.7.0 -O3 -DGTEST_USE_OWN_TR1_TUPLE  -isystem lib/gtest_1.7.0/include -isystem lib/gtest_1.7.0 -O3 -DNO_FPRINTF_OUTPUT -pipe   -shared -fPIC -o test/unit/libmultiple.so test/unit/multiple_translation_units1.o test/unit/multiple_translation_units2.o
[Unit] Undefined symbols for architecture x86_64:
[Unit]   "_clReleaseCommandQueue", referenced from:
[Unit]       cl::CommandQueue::~CommandQueue() in multiple_translation_units1.o
[Unit]       stan::math::ocl::~ocl() in multiple_translation_units1.o
[Unit]       cl::CommandQueue::~CommandQueue() in multiple_translation_units2.o
[Unit]       stan::math::ocl::~ocl() in multiple_translation_units2.o
[Unit]   "_clReleaseContext", referenced from:
[Unit]       cl::Context::~Context() in multiple_translation_units1.o
[Unit]       stan::math::ocl::~ocl() in multiple_translation_units1.o
[Unit]       cl::Context::~Context() in multiple_translation_units2.o
[Unit]       stan::math::ocl::~ocl() in multiple_translation_units2.o
[Unit]   "_clReleaseDevice", referenced from:
[Unit]       stan::math::ocl::~ocl() in multiple_translation_units1.o
[Unit]       stan::math::ocl::~ocl() in multiple_translation_units2.o
[Unit]   "_clReleaseKernel", referenced from:
[Unit]       std::__1::__tree<std::__1::__value_type<std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char> >, cl::Kernel>, std::__1::__map_value_compare<std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char> >, std::__1::__value_type<std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char> >, cl::Kernel>, std::__1::less<std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char> > >, true>, std::__1::allocator<std::__1::__value_type<std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char> >, cl::Kernel> > >::destroy(std::__1::__tree_node<std::__1::__value_type<std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char> >, cl::Kernel>, void*>*) in multiple_translation_units1.o
[Unit]       std::__1::__tree<std::__1::__value_type<std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char> >, cl::Kernel>, std::__1::__map_value_compare<std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char> >, std::__1::__value_type<std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char> >, cl::Kernel>, std::__1::less<std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char> > >, true>, std::__1::allocator<std::__1::__value_type<std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char> >, cl::Kernel> > >::destroy(std::__1::__tree_node<std::__1::__value_type<std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char> >, cl::Kernel>, void*>*) in multiple_translation_units2.o
[Unit] ld: symbol(s) not found for architecture x86_64
[Unit] clang: error: linker command failed with exit code 1 (use -v to see invocation)
[Unit] clang++ -Werror -Wall -I . -isystem lib/eigen_3.3.3 -isystem lib/boost_1.65.1 -isystem lib/cvodes_2.9.0/include -isystem lib/opencl_1.2.8	 -std=c++1y -DBOOST_RESULT_OF_USE_TR1 -DBOOST_NO_DECLTYPE -DBOOST_DISABLE_ASSERTS -DBOOST_PHOENIX_NO_VARIADIC_EXPRESSION -Wno-unused-function -Wno-uninitialized -DSTAN_GPU -DSTAN_DEVICE_CPU -stdlib=libc++ -Wno-unknown-warning-option -Wno-tautological-compare -Wsign-compare -DGTEST_USE_OWN_TR1_TUPLE  -isystem lib/gtest_1.7.0/include -isystem lib/gtest_1.7.0 -O3 -DGTEST_USE_OWN_TR1_TUPLE  -isystem lib/gtest_1.7.0/include -isystem lib/gtest_1.7.0 -O3 -DGTEST_USE_OWN_TR1_TUPLE  -isystem lib/gtest_1.7.0/include -isystem lib/gtest_1.7.0 -O3 -DNO_FPRINTF_OUTPUT -pipe   -shared -fPIC -o test/unit/libmultiple.so test/unit/multiple_translation_units1.o test/unit/multiple_translation_units2.o
[Unit] Undefined symbols for architecture x86_64:
[Unit]   "_clReleaseCommandQueue", referenced from:
[Unit]       cl::CommandQueue::~CommandQueue() in multiple_translation_units1.o
[Unit]       stan::math::ocl::~ocl() in multiple_translation_units1.o
[Unit]       cl::CommandQueue::~CommandQueue() in multiple_translation_units2.o
[Unit]       stan::math::ocl::~ocl() in multiple_translation_units2.o
[Unit]   "_clReleaseContext", referenced from:
[Unit]       cl::Context::~Context() in multiple_translation_units1.o
[Unit]       stan::math::ocl::~ocl() in multiple_translation_units1.o
[Unit]       cl::Context::~Context() in multiple_translation_units2.o
[Unit]       stan::math::ocl::~ocl() in multiple_translation_units2.o
[Unit]   "_clReleaseDevice", referenced from:
[Unit]       stan::math::ocl::~ocl() in multiple_translation_units1.o
[Unit]       stan::math::ocl::~ocl() in multiple_translation_units2.o
[Unit]   "_clReleaseKernel", referenced from:
[Unit]       std::__1::__tree<std::__1::__value_type<std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char> >, cl::Kernel>, std::__1::__map_value_compare<std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char> >, std::__1::__value_type<std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char> >, cl::Kernel>, std::__1::less<std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char> > >, true>, std::__1::allocator<std::__1::__value_type<std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char> >, cl::Kernel> > >::destroy(std::__1::__tree_node<std::__1::__value_type<std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char> >, cl::Kernel>, void*>*) in multiple_translation_units1.o
[Unit]       std::__1::__tree<std::__1::__value_type<std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char> >, cl::Kernel>, std::__1::__map_value_compare<std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char> >, std::__1::__value_type<std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char> >, cl::Kernel>, std::__1::less<std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char> > >, true>, std::__1::allocator<std::__1::__value_type<std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char> >, cl::Kernel> > >::destroy(std::__1::__tree_node<std::__1::__value_type<std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char> >, cl::Kernel>, void*>*) in multiple_translation_units2.o
[Unit] ld: symbol(s) not found for architecture x86_64
[Unit] clang: error: linker command failed with exit code 1 (use -v to see invocation)
[Unit] clang++ -Werror  -DNO_FPRINTF_OUTPUT -pipe  -c -x c -O3 -isystem lib/cvodes_2.9.0/include lib/cvodes_2.9.0/src/nvec_ser/nvector_serial.c -o lib/cvodes_2.9.0/src/nvec_ser/nvector_serial.o
[Unit] clang++ -Werror  -DNO_FPRINTF_OUTPUT -pipe  -c -x c -O3 -isystem lib/cvodes_2.9.0/include lib/cvodes_2.9.0/src/sundials/sundials_math.c -o lib/cvodes_2.9.0/src/sundials/sundials_math.o
[Unit] clang++ -Werror  -DNO_FPRINTF_OUTPUT -pipe  -c -x c -O3 -isystem lib/cvodes_2.9.0/include lib/cvodes_2.9.0/src/cvodes/cvodes.c -o lib/cvodes_2.9.0/src/cvodes/cvodes.o
[Unit] clang++ -Werror  -DNO_FPRINTF_OUTPUT -pipe  -c -x c -O3 -isystem lib/cvodes_2.9.0/include lib/cvodes_2.9.0/src/cvodes/cvodes_io.c -o lib/cvodes_2.9.0/src/cvodes/cvodes_io.o
[Unit] clang++ -Werror  -DNO_FPRINTF_OUTPUT -pipe  -c -x c -O3 -isystem lib/cvodes_2.9.0/include lib/cvodes_2.9.0/src/cvodes/cvodea.c -o lib/cvodes_2.9.0/src/cvodes/cvodea.o
[Unit] clang++ -Werror  -DNO_FPRINTF_OUTPUT -pipe  -c -x c -O3 -isystem lib/cvodes_2.9.0/include lib/cvodes_2.9.0/src/cvodes/cvodea_io.c -o lib/cvodes_2.9.0/src/cvodes/cvodea_io.o
[Unit] clang++ -Werror  -DNO_FPRINTF_OUTPUT -pipe  -c -x c -O3 -isystem lib/cvodes_2.9.0/include lib/cvodes_2.9.0/src/cvodes/cvodes_direct.c -o lib/cvodes_2.9.0/src/cvodes/cvodes_direct.o
[Headers] clang++ -Werror -Wall -I . -isystem lib/eigen_3.3.3 -isystem lib/boost_1.65.1 -isystem lib/cvodes_2.9.0/include -isystem lib/opencl_1.2.8	 -std=c++1y -DBOOST_RESULT_OF_USE_TR1 -DBOOST_NO_DECLTYPE -DBOOST_DISABLE_ASSERTS -DBOOST_PHOENIX_NO_VARIADIC_EXPRESSION -Wno-unused-function -Wno-uninitialized -DSTAN_GPU -DSTAN_DEVICE_CPU -stdlib=libc++ -Wno-unknown-warning-option -Wno-tautological-compare -Wsign-compare -DNO_FPRINTF_OUTPUT -pipe  -c -O0 -include stan/math/rev/scal/fun/inv_logit.hpp test/dummy.cpp -o /dev/null

@SteveBronder
Copy link
Collaborator Author

SteveBronder commented Jan 18, 2018

I think this is because of

  std::string description_;
  cl::Context oclContext_;
  cl::CommandQueue oclQueue_;
  cl::Platform oclPlatform_;
  cl::Device oclDevice_;

(here)

which I think I can make static and clear up those error. A similar error was causing failures in multiple_translation_units before. Ty for checking it out! I'll look at it more when I get home from work

EDIT: No actually it's that the multiple translation units test is not getting the -framework OpenCL flag stackoverflow link

@syclik
Copy link
Member

syclik commented Jan 18, 2018

Ok. That makes sense. Looks like you got it going, at least for Travis. I'm not sure what's happening with Jenkins yet.

@SteveBronder SteveBronder changed the title gpu setup gpu setup [second stage] Jan 20, 2018
@SteveBronder
Copy link
Collaborator Author

This is large because the GPU kernels are also included in this commit. What will the review process be for the kernels? Should I include them in a separate PR?

@@ -12,8 +12,18 @@ AR = ar
# CPPFLAGS are used for both C and C++ compilation
CPPFLAGS = -DNO_FPRINTF_OUTPUT -pipe
# CXXFLAGS are just used for C++
CXXFLAGS = -Wall -I . -isystem $(EIGEN) -isystem $(BOOST) -isystem $(CVODES)/include -std=c++1y -DBOOST_RESULT_OF_USE_TR1 -DBOOST_NO_DECLTYPE -DBOOST_DISABLE_ASSERTS -DBOOST_PHOENIX_NO_VARIADIC_EXPRESSION -Wno-unused-function -Wno-uninitialized
GTEST_CXXFLAGS = -DGTEST_USE_OWN_TR1_TUPLE
CXXFLAGS = -Wall -I . -isystem $(EIGEN) -isystem $(BOOST) -isystem $(CVODES)/include -isystem $(OPENCL) -std=c++1y -DBOOST_RESULT_OF_USE_TR1 -DBOOST_NO_DECLTYPE -DBOOST_DISABLE_ASSERTS -DBOOST_PHOENIX_NO_VARIADIC_EXPRESSION -Wno-unused-function -Wno-uninitialized -DSTAN_GPU -DSTAN_DEVICE_CPU
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@seantalts et al.

For removing sudo in the jenkins file, can I add something for CXXFLAGS like

ifeq (${TRAVIS},true)
  CXXFLAGS+=-L${HOME}/AMDAPPSDK/lib/x86_64/
endif

ifeq ($(shell echo $(TRAVIS_OS_NAME)),linux)
POST_LDLIBS+=-L/home/travis/AMDAPPSDK/lib/x86_64/
endif
endif
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I added this which is working on travis without sudo

else
POST_LDLIBS+=-lOpenCL
endif
endif
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Relevant side note: I have no idea how to compile this for windows. Does Jenkins test on Windows?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Jenkins does test on Windows.

@@ -0,0 +1,296 @@
#ifndef STAN_MATH_PRIM_MAT_FUN_OCL_HPP
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was not really sure where to put this file. It's purpose is to set up the gpu backend. Should this be in some more internal folder?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

let me think about it. We have the autodiff stack sitting in stan/math/memory/stack_alloc.hpp.

; // NOLINT
kernel_strings["basic_matrix"] =
#include <stan/math/prim/mat/kern_gpu/basic_matrix.cl> // NOLINT
; // NOLINT
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These #includes are here because when I was trying to read the kernels in with ifstream I was not guaranteed that they would be loaded by the time this map was called to compile the kernel. (Some sort of call before initialization error?)

If there is a prettier / better way to do this I am for it

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you point me to a reference to the kernels? They're just strings that are stored?

}


)====="
Copy link
Collaborator Author

@SteveBronder SteveBronder Jan 20, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If all of these kernels makes the review too bulky I can remove them or just take a subset

@SteveBronder
Copy link
Collaborator Author

SteveBronder commented Jan 20, 2018

from here

g++-4.9 -Werror -Wall -I . -isystem lib/eigen_3.3.3 -isystem lib/boost_1.65.1 -isystem lib/cvodes_2.9.0/include -isystem lib/opencl_1.2.8	 -std=c++1y -DBOOST_RESULT_OF_USE_TR1 -DBOOST_NO_DECLTYPE -DBOOST_DISABLE_ASSERTS -DBOOST_PHOENIX_NO_VARIADIC_EXPRESSION -Wno-unused-function -Wno-uninitialized -DSTAN_GPU -DSTAN_DEVICE_CPU -DNO_FPRINTF_OUTPUT -pipe  -c -O0 -include stan/math/prim/scal/prob/chi_square_log.hpp test/dummy.cpp -o /dev/null
g++-4.9: internal compiler error: Terminated (program cc1plus)
Please submit a full bug report,

Que? Is there someway to restart a part of the overall testing like the jobs in travis?

EDIT: Jenkins is not throwing this error anymore

@SteveBronder SteveBronder force-pushed the gpu_setup branch 3 times, most recently from 983887e to 8e3a36b Compare January 21, 2018 00:17
@@ -12,8 +12,24 @@ AR = ar
# CPPFLAGS are used for both C and C++ compilation
CPPFLAGS = -DNO_FPRINTF_OUTPUT -pipe
# CXXFLAGS are just used for C++
CXXFLAGS = -Wall -I . -isystem $(EIGEN) -isystem $(BOOST) -isystem $(CVODES)/include -std=c++1y -DBOOST_RESULT_OF_USE_TR1 -DBOOST_NO_DECLTYPE -DBOOST_DISABLE_ASSERTS -DBOOST_PHOENIX_NO_VARIADIC_EXPRESSION -Wno-unused-function -Wno-uninitialized
CXXFLAGS = -Wall -I . -isystem $(EIGEN) -isystem $(BOOST) -isystem $(CVODES)/include -isystem $(OPENCL) -std=c++1y -DBOOST_RESULT_OF_USE_TR1 -DBOOST_NO_DECLTYPE -DBOOST_DISABLE_ASSERTS -DBOOST_PHOENIX_NO_VARIADIC_EXPRESSION -Wno-unused-function -Wno-uninitialized -DSTAN_GPU -DSTAN_DEVICE_CPU
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We'll need to change this. This forces STAN_GPU to be defined for all possible uses.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What's STAN_DEVICE_CPU? That looks new?

else
POST_LDLIBS+=-lOpenCL
endif
endif
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Jenkins does test on Windows.

make/tests Outdated
@@ -13,7 +13,7 @@ $(GTEST)/src/gtest-all.o: CXXFLAGS += $(GTEST_CXXFLAGS)

test/%$(EXE) : CXXFLAGS += $(GTEST_CXXFLAGS)
test/%$(EXE) : test/%.o $(GTEST_MAIN) $(GTEST)/src/gtest-all.o
$(LINK.cpp) -o $@ $^
$(LINK.cpp) -o $@ $^ $(POST_LDLIBS)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We'll fix this.

@syclik
Copy link
Member

syclik commented Jan 22, 2018

What's with the two macros now?

  • STAN_GPU
  • STAN_DEVICE_CPU

Are they doc'ed anywhere?

@syclik
Copy link
Member

syclik commented Jan 22, 2018

@SteveBronder, we're going to update the make portion of this.

So you know what's coming up, what we're going to do is define a make variable. When that's set, we will add the compiler flags to CXXFLAGS including the includes and we'll add linker flags to LDFLAGS. We're going to remove the direct addition of the compiler flags to CXXFLAGS and we're going to remove the POST_LDLIBS variable.

@syclik
Copy link
Member

syclik commented Jan 22, 2018

@SteveBronder, I just updated the PR.

@seantalts, I could use some guidance on how to do the travis thing right. Do we want to test everything with and without GPU?

@SteveBronder
Copy link
Collaborator Author

What's with the two macros now?

STAN_GPU

STAN_GPU tell the compiler to include the OpenCL files. Maybe this should be STAN_OPENCL

STAN_DEVICE_CPU

This tells the compiler to compile the OpenCL code for the CPU instead of the GPU so we can run on travis.

@SteveBronder
Copy link
Collaborator Author

So you know what's coming up, what we're going to do is define a make variable. When that's set, we will add the compiler flags to CXXFLAGS including the includes and we'll add linker flags to LDFLAGS. We're going to remove the direct addition of the compiler flags to CXXFLAGS and we're going to remove the POST_LDLIBS variable.

I'm cool with this

@syclik
Copy link
Member

syclik commented Jan 22, 2018

Sorry, can you take a step back and describe the two macros? I don't get it. If there are two defined macros, that's 4 configurations. Under what conditions are the four different conditions used?

@seantalts
Copy link
Member

@syclik I think we should do GPU tests on Jenkins only. I doubt travis even supports GPUs, but we wouldn't want to rely on it there anyway.

@syclik
Copy link
Member

syclik commented Jan 22, 2018

@seantalts, that sounds good. We'll test both CPU and GPU on Jenkins.

device.getInfo<CL_DEVICE_AVAILABLE>() << "\n";
}
} catch (const cl::Error& e) {
msg << "\tno GPU devices in the specified platform" << "\n";
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice catch! Instead of having a message here could we make a call to check_opencl_error()? I think it would throw a -1001: "CL_PLATFORM_NOT_FOUND_KHR or if not we can figure out what error code it returns and setup an error code for it.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we throw at all since having additional platforms without GPUs is not a problem and could happen?

We are just outputing the information here.

This is only a problem if the user selecta the platform that has no GPU devices. But that is handled elsewhere and does throw an exception, as it should :)

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think if we can throw a more informative error message then "C++ exception with description "clGetPlatformIDs" than it's a good idea to catch and handle it

if (OPENCL_PLATFORM_ID >= platforms_.size()) {
system_error("OpenCL Initialization", "[Platform]", -1,
"CL_INVALID_PLATFORM");
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If a user specified an ID that did not exist, the test produced a seg fault. We now output the following:

[----------] Global test environment set-up.
[----------] 3 tests from MathGpu
[ RUN      ] MathGpu.getInfo
unknown file: Failure
C++ exception with description "OpenCL Initialization: [Platform] CL_INVALID_PLATFORM: Unknown error -1" thrown in the test body.
[  FAILED  ] MathGpu.getInfo (29 ms)

if (OPENCL_DEVICE_ID >= devices_.size()) {
system_error("OpenCL Initialization", "[Device]", -1,
"CL_INVALID_DEVICE");
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If a user specified an ID that did not exist, the test produced a seg fault. We now output the following:

[ RUN ] MathGpu.getInfo
unknown file: Failure
C++ exception with description "opencl_context: clGetDeviceIDs CL_DEVICE_NOT_FOUND: Unknown error -1" thrown in the test body.
[ FAILED ] MathGpu.getInfo (31 ms)

msg << "\tno GPU devices in the specified platform"
<< "\n";
msg << "\tno GPU devices in the platform with ID "
<< platform_id << "\n";
Copy link
Member

@rok-cesnovar rok-cesnovar Apr 21, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is an example output listing all the platforms on a system. The first 2 have no GPU devices and the last one has 1 GPU device.


Number of Platforms: 3
Platform ID: 0
Platform Name: Experimental OpenCL 2.1 CPU Only Platform
Platform Vendor: Intel(R) Corporation
  no GPU devices in the platform with ID 1
Platform ID: 1
Platform Name: Intel(R) OpenCL
Platform Vendor: Intel(R) Corporation
  no GPU devices in the platform with ID 2
Platform ID: 2
Platform Name: NVIDIA CUDA
Platform Vendor: NVIDIA Corporation
  Device 0: 
    Device Name: Graphics Device
    Device Type: 4
    Device Vendor: NVIDIA Corporation
    Device Max Compute Units: 30
    Device Global Memory: 12780699648
    Device Max Clock Frequency: 1582
    Device Max Allocateable Memory: 3195174912
    Device Local Memory: 49152
    Device Available: 1

@rok-cesnovar
Copy link
Member

rok-cesnovar commented Apr 21, 2018

I made a few changes on how we handle if a user specifies a platform ID that does not exist. I made review comments for the changes.

I also tested what happens if we try this on a system that has no OpenCL platforms but has the OpenCL library. It produces the following output

Running main() from gtest_main.cc
[==========] Running 6 tests from 2 test cases.
[----------] Global test environment set-up.
[----------] 3 tests from MathGpu
[ RUN      ] MathGpu.getInfo
unknown file: Failure
C++ exception with description "opencl_context: clGetPlatformIDs CL_PLATFORM_NOT_FOUND_KHR: Unknown error -1001" thrown in the test body.
[  FAILED  ] MathGpu.getInfo (0 ms)
[ RUN      ] MathGpu.context_construction
unknown file: Failure
C++ exception with description "opencl_context: clGetPlatformIDs CL_PLATFORM_NOT_FOUND_KHR: Unknown error -1001" thrown in the test body.
[  FAILED  ] MathGpu.context_construction (0 ms)
[ RUN      ] MathGpu.kernel_construction
test/unit/math/gpu/opencl_context_test.cpp:22: Failure
Expected: stan::math::opencl_context.get_kernel("dummy") doesn't throw an exception.
  Actual: it throws.
test/unit/math/gpu/opencl_context_test.cpp:23: Failure
Expected: stan::math::opencl_context.get_kernel("dummy2") doesn't throw an exception.
  Actual: it throws.
[  FAILED  ] MathGpu.kernel_construction (1 ms)
[----------] 3 tests from MathGpu (1 ms total)

[----------] 3 tests from opencl_context
[ RUN      ] opencl_context.platform
unknown file: Failure
C++ exception with description "clGetPlatformIDs" thrown in the test body.
[  FAILED  ] opencl_context.platform (0 ms)
[ RUN      ] opencl_context.devices
unknown file: Failure
C++ exception with description "listing_devices_test: clGetPlatformIDs CL_PLATFORM_NOT_FOUND_KHR: Unknown error -1001" thrown in the test body.
[  FAILED  ] opencl_context.devices (0 ms)
[ RUN      ] opencl_context.compile_kernel_rawcode
unknown file: Failure
C++ exception with description "opencl_context: clGetPlatformIDs CL_PLATFORM_NOT_FOUND_KHR: Unknown error -1001" thrown in the test body.
[  FAILED  ] opencl_context.compile_kernel_rawcode (0 ms)
[----------] 3 tests from opencl_context (0 ms total)

[----------] Global test environment tear-down
[==========] 6 tests from 2 test cases ran. (1 ms total)
[  PASSED  ] 0 tests.
[  FAILED  ] 6 tests, listed below:
[  FAILED  ] MathGpu.getInfo
[  FAILED  ] MathGpu.context_construction
[  FAILED  ] MathGpu.kernel_construction
[  FAILED  ] opencl_context.platform
[  FAILED  ] opencl_context.devices
[  FAILED  ] opencl_context.compile_kernel_rawcode

 6 FAILED TESTS

@rok-cesnovar
Copy link
Member

On a side note, I played around with running these tests on Windows with an NVIDIA GPU. Getting to compile with OpenCL ( I used a toolchain based on gcc and mingw-w64 via RTools) was surprisingly smooth.

There is however again the problem with the -lOpenCL flag, that I needed to move behind the -o
My quick and dirty solution was to modify the make/tests adding the LDFLAGS_OPENCL:

test/%$(EXE) : test/%.o $(GTEST_MAIN) $(GTEST)/src/gtest-all.o
	$(LINK.cpp) -o $@ $^ $(LDFLAGS_OPENCL)

Is there any less dirty way of doing this?

@SteveBronder
Copy link
Collaborator Author

@seantalts can you help Ben setup the GPU stuff on the linux box? I can also stop by Friday afternoon to help get things set up if you are available Ben

@bgoodri
Copy link
Contributor

bgoodri commented Apr 25, 2018

Tests passing on my Debian Stable laptop if I use g++ (not sure what combination of things actually were necessary to fix it), but it still crashes if I use clang++. And the server still crashes if I use clang++ but has a linker error if I use g++.

@SteveBronder
Copy link
Collaborator Author

SteveBronder commented Apr 25, 2018 via email

@@ -194,7 +194,7 @@ class opencl_context {
// and mark them as compiled.
for (auto kern : kernel_info()) {
if (strcmp(kern.second.group, kernel_group) == 0) {
kernels()[(kern.first)] = cl::Kernel(program_, kern.first, &err);
opencl_context_base::getInstance().kernels[(kern.first)] = cl::Kernel(program_, kern.first, &err);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I had to change this, as I was getting the "CL_INVALID_KERNEL" when using the kernel (retrieved with get_kernel()). Retrieving the name returned an empty string. I added a test to check this.

Ben Goodrich and others added 2 commits April 29, 2018 14:51
@bgoodri bgoodri mentioned this pull request Apr 29, 2018
3 tasks
@SteveBronder
Copy link
Collaborator Author

With Ben's PR above I think this is all good to go?

@syclik
Copy link
Member

syclik commented May 1, 2018

Yeah, looks great! Thanks @SteveBronder, @rok-cesnovar, and @bgoodri for working out the practical issues of GPU!

@syclik syclik merged commit b1e394a into stan-dev:develop May 1, 2018
@rok-cesnovar rok-cesnovar deleted the gpu_setup branch May 2, 2018 07:26
@rok-cesnovar rok-cesnovar restored the gpu_setup branch May 2, 2018 07:27
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

9 participants