Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Change how users access OpenCL kernels #966

Merged
merged 47 commits into from
Aug 23, 2018

Conversation

SteveBronder
Copy link
Collaborator

@SteveBronder SteveBronder commented Aug 4, 2018

Checklist

Rok Češnovar and Erik Štrumbelj (Faculty of Computer and Information Science, University of Ljubljana)
Steve Bronder

  • the code base is stable

    • all unit tests pass
    • continuous integration passes
  • the changes are maintainable

    • the code design is idiomatic C++ or there's a good reason it's not
    • please include appropriate documentation
  • the changes are tested

  • the changes adhere to the Math library's C++ standards

Summary

Easier Access To Kernels

Creates a kernel_cl_base singleton to manage the compiled kernels and a kernel_cl friend adapter class to give users access to kernels along with helper functions. Currently, kernels are compiled from within the opencl_context_base class. The GPU team is concerned that this class is becoming too bulky and so we created a new kernel_cl class that users can utilize to access kernels. Currently users run something like the following to compile and send arguments over to an OpenCL kernel.

cl::Kernel kernel = opencl_context.get_kernel("add");
opencl_context.set_kernel_args(kernel, C.buffer(), A.buffer(), B.buffer(),
                                   A.rows(), A.cols());

Users can now call the following

auto add_kernel = kernel_cl.add(C.buffer(), A.buffer(), B.buffer(), A.rows(), A.cols());

Remove Kernel Groups

This PR also removes the kernel groups such that each kernel is now compiled into it's own program. Originally the OpenCL context would compile kernels which were grouped together (IE 'cholesky', 'checks' ,etc.) into a single program with the idea being that if multiple kernels are used together it's smarter to compile all of them at once. While compiling multiple kernels into a single program is faster, as we add more kernels how we choose to label and group them would end up being difficult and too heuristic. When running some tests it did not seem like compiling the kernels separately produced a noticeable amount of overhead wrt the life of the executable.

Tests

The kernel_cl test checks whether the transpose kernel can be compiled. Code to run the test below

echo STAN_OPENCL=true>> make/local
echo OPENCL_PLATFORM_ID=0>> make/local
echo OPENCL_DEVICE_ID=0>> make/local
make test/unit/math/gpu/kernel_cl_test && ./test/unit/math/gpu/kernel_cl_test

Side Effects

None

Additional Notes

Leave additional notes for the code reviewer here.

@SteveBronder SteveBronder self-assigned this Aug 5, 2018
@SteveBronder SteveBronder changed the title [WIP] Change how users access OpenCL kernels Change how users access OpenCL kernels Aug 5, 2018
@syclik
Copy link
Member

syclik commented Aug 7, 2018

Please create a new issue. It helps us manage this stuff; if there's a single issue per pull request, we can use "fixes #___" to automatically link and close issues.

@syclik
Copy link
Member

syclik commented Aug 7, 2018

whoa. I just read the description and that's great. what's with the naming? why kernel_cl_base?

If I'm understanding correctly, the constructor to kernel_cl looks up the compiled kernel by string?

@rok-cesnovar
Copy link
Member

We are following the same naming convention we setup in the opencl_context (opencl_context_base is the singleton, opencl_context is the friend class).

kernel_cl checks if a kernel with the provided name has been compiled. If yes, it sets the member that holds the compiled kernel. If it wasnt compiled yet, it compiles it first.

One side effect of this PR is also that it organizes how we treat the kernel parameters (like LOWER, UPPER,...) a lot better.

@SteveBronder
Copy link
Collaborator Author

Along with what Rok mentioned above, I want to make a PR soon that changes all of the names from *_gpu_* to *_cl_*. I think it looks nicer and makes more sense. I'll file an issue soon describing the naming change in more detail

@seantalts
Copy link
Member

Can we also make it so that the cl files are not string literals but would actually be readable by an IDE? :D

Also I'm happy to review GPU stuff - going through a workshop with Rok and Erik now on it and it's really cool. Let me know if I can take this off your plate, @syclik

@syclik
Copy link
Member

syclik commented Aug 9, 2018

@seantalts, please and thank you! I've been buried trying to get the make stuff to work consistently. (It's working through Math and Stan now! Now onto CmdStan.)

@syclik syclik requested review from seantalts and removed request for syclik August 9, 2018 13:12
@SteveBronder
Copy link
Collaborator Author

Can we also make it so that the cl files are not string literals but would actually be readable by an IDE?

@seantalts I would absolutely love this but am having a hard time figuring out how to do it. When we tried reading them in with something like an ifstream we would get the instantiation fiasco error. Also wasn't sure how to specify the file locations since we are a header library.

The only other way I've seen to do it is via stringify like this stackoverflow Q. Though when I tried this it also threw up a bunch of errors, though maybe this is worth a second look

If you know a better way to do it then I'm 100% about it!

Also I'm happy to review GPU stuff

!! V appreciated !!

@seantalts
Copy link
Member

Also just a note for those following this thread: some discussion has continued on the proposal above here: bstatcomp#7

@syclik
Copy link
Member

syclik commented Aug 20, 2018 via email

seantalts
seantalts previously approved these changes Aug 23, 2018
seantalts
seantalts previously approved these changes Aug 23, 2018
@seantalts
Copy link
Member

Just waiting for tests to pass now.

@seantalts
Copy link
Member

Okay think I fixed it that one by adding the GPU #ifdefs

@rok-cesnovar
Copy link
Member

Thanks @seantalts for all the help and effort in helping us improve this! Really appreciated.

@rok-cesnovar rok-cesnovar merged commit 0cdab8e into stan-dev:develop Aug 23, 2018
@seantalts
Copy link
Member

seantalts commented Aug 24, 2018

The GPU tests no longer run on a PR and they broke on develop I think because we all forgot about a GPU test that isn't in the GPU folder:
test/unit/math/prim/mat/fun/opencl_copy_test.cpp
Should we move that into the gpu test folder? or
The test compiles again when #include <stan/math/gpu/copy.hpp> is added.

@rok-cesnovar
Copy link
Member

I would move it. Can we do it in #1001 ?

@SteveBronder SteveBronder deleted the kernel_cl branch May 22, 2019 03:45
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

7 participants