Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Rev for Cholesky on GPU #1118

Merged
merged 63 commits into from
Mar 2, 2019
Merged

Conversation

SteveBronder
Copy link
Collaborator

@SteveBronder SteveBronder commented Feb 12, 2019

Summary

Last PR!

When users define STAN_OPENCL during compilation the reverse mode cholesky will take place on the GPU.

Kinda nice, using all the stuff we built before the code here is pretty similar to the code for the Eigen version of the derivative in cholesky_block

Tests

If you have clinfo installed, you can use clinfo -l to see your available platforms and device index values. With only one platform and device the below should work, otherwise place the appropriate index values for your device and platform below.

echo STAN_OPENCL=true>> make/local
echo OPENCL_PLATFORM_ID=0>> make/local
echo OPENCL_DEVICE_ID=0>> make/local
./runTests.py ./runTests.py test/unit/math/rev/mat/fun/cholesky_decompose_test.cpp 

Side Effects

Moves Cholesky derivative from CPU to GPU

Checklist

  • Rok Češnovar and Erik Štrumbelj (Faculty of Computer and Information Science, University of Ljubljana)
  • Steve Bronder
  • the basic tests are passing

    • unit tests pass (to run, use: ./runTests.py test/unit)
    • header checks pass, (make test-headers)
    • docs build, (make doxygen)
    • code passes the built in C++ standards checks (make cpplint)
  • the code is written in idiomatic C++ and changes are documented in the doxygen

  • the new changes are tested

@rok-cesnovar
Copy link
Member

@seantalts I believe I addressed all your comments.

Regarding precision: my plan was to compare fits with STAN_OPENCL and without for either this example (example.txt) from the Stan manual or the model we used for StanCon ?
Any thoughts on this? I could run this with different values for the primitive cholesky_decompose parameters.

@seantalts
Copy link
Member

Re: Precision. Hmm. I talked with Bob a little about this, and I don't think there's a clear way to measure precision as it kind of depends on the matrix sizes and contents thereof. There are many possible paths, and I think we should just pick one or two and be clear about the assumptions and what we tested (and code is usually pretty clear). I'll propose doing two things:

  1. SBC on the model from the manual to see if we can spot any issues with fitting. Ben is adding SBC support to RStan and it's almost in, see Feature/sbc rstan#611. Let's run something like 100 replications of 10 posterior draws each and see what we can see at first.
  2. @bgoodri's idea for a Cholesky decomposition unit test:

https://en.wikipedia.org/wiki/Cholesky_decomposition#Updating_the_decomposition
where we start with a Cholesky factor of a big covariance matrix and then imagine rank-1 or rank-2 updates to the covariance matrix. That can be done more-or-less exactly by direct manipulation of the original Cholesky factor that we can compare to the naive way of multiplying the Cholesky factor by its own transpose, adding the updates, and taking the Cholesky factor of the result.

We can add a test for the relative error for this procedure on GPU and our CPU versions to the source code (see this for measuring relative error). Neither of these have to be done before this PR goes in, but we should do them before we release GPUs to the world, and then we can add a little section describing GPU precision to the manual section on GPU operations.

@bob-carpenter
Copy link
Contributor

bob-carpenter commented Mar 1, 2019 via email

@seantalts
Copy link
Member

My point was that whatever the precision, it probably shouldn't block adding GPUs.

But did I interpret you correctly as saying it would be good to somehow characterize the precision before adding them? Or can we wait on that as well?

Copy link
Member

@seantalts seantalts left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oops, forgot to submit. Just one code comment and then I'm wondering if there's anything quick&easy we could do in the test to ensure the GPU version is being called... any ideas?

stan/math/rev/mat/fun/cholesky_decompose.hpp Outdated Show resolved Hide resolved
@seantalts
Copy link
Member

@bgoodri do you know how to do what Bob is suggesting (basically generating good test matrices that are tricky for decomposition)?

@SteveBronder
Copy link
Collaborator Author

I'm wondering if there's anything quick&easy we could do in the test to ensure the GPU version is being called... any ideas?

We could set opencl_context.tuning_opts().cholesky_size_worth_transfer = 0; to have that condition always go off right? Or is there somethign else you are worried about that would stop the test from happening

@seantalts
Copy link
Member

I'm thinking more to verify that the threshold is working correctly... like could we check the opencl context to see that it has had some matrices pass through it or something? any other little side effects like that?

@SteveBronder
Copy link
Collaborator Author

I'll have to check, @rok-cesnovar do you know? I think since we don't track events or do profiling
there's nothing specific in the queue to tell us that info.

In the OpenCL Spec there's a lot of getInfo() functions so we may be able to get some stuff from there

I'm not totally understanding the reason to test that tho'? Like isn't that checking if the if statement goes off?

@rok-cesnovar
Copy link
Member

I cant think of a simple way of doing that for the primitve cholesky without using events that we are currently not using. For the rev cholesky we could possibly check the allocated memory in the arena, idk?

@seantalts
Copy link
Member

Yeah, basically checking that these tests are actually running on the GPU code we think it's running on. You could imagine someone coming along in the future and changing the templating, overloads, if statement, or tuning params slightly and losing that accidentally while the test will still tell you the cholesky is good to go... Hmm, there might just not be a good way to do it right now, so never mind about that.

Also @rok-cesnovar I missed that that version was in-place, woops. Makes sense.

Copy link
Member

@seantalts seantalts left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this looks good! I think before releasing to the world (i.e. telling people to turn on STAN_OPENCL) we just need to somehow be able to comment on its precision...

@SteveBronder
Copy link
Collaborator Author

Ah, yeah Sean I agree that's a very good point! I'm going to iterate on the Async stuff this weekend and with that we will be able to use the events and profiling to check out what is happening and what went off.

@SteveBronder SteveBronder merged commit 0c89499 into stan-dev:develop Mar 2, 2019
@SteveBronder
Copy link
Collaborator Author

Also pretty darn cool! I'm going to make an issue tonight with a checklist of things that we want to happen before we make a full announcement about this

@SteveBronder
Copy link
Collaborator Author

Also gosh, huge thanks to @rok-cesnovar and @seantalts, pretty cool this is finally kicking!

@seantalts
Copy link
Member

Thank you for driving this forward relentlessly :) super excited at the progress and looking forward to the announcement :)

@rok-cesnovar
Copy link
Member

@SteveBronder @seantalts I also want to thank you both for the hard work. Excited :)

@SteveBronder SteveBronder deleted the gpu-cholesky-rev branch May 22, 2019 03:44
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

7 participants