-
-
Notifications
You must be signed in to change notification settings - Fork 183
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Rev for Cholesky on GPU #1118
Rev for Cholesky on GPU #1118
Conversation
…into gpu-cholesky-rev
# Conflicts: # stan/math/prim/mat/fun/cholesky_decompose.hpp
# Conflicts: # stan/math/gpu/multiply.hpp
…gs/RELEASE_500/final)
…into gpu-cholesky-rev
…gs/RELEASE_500/final)
@seantalts I believe I addressed all your comments. Regarding precision: my plan was to compare fits with STAN_OPENCL and without for either this example (example.txt) from the Stan manual or the model we used for StanCon ? |
…into gpu-cholesky-rev
Re: Precision. Hmm. I talked with Bob a little about this, and I don't think there's a clear way to measure precision as it kind of depends on the matrix sizes and contents thereof. There are many possible paths, and I think we should just pick one or two and be clear about the assumptions and what we tested (and code is usually pretty clear). I'll propose doing two things:
We can add a test for the relative error for this procedure on GPU and our CPU versions to the source code (see this for measuring relative error). Neither of these have to be done before this PR goes in, but we should do them before we release GPUs to the world, and then we can add a little section describing GPU precision to the manual section on GPU operations. |
My point was that whatever the precision, it probably shouldn't block adding GPUs.
I think testing with a known Cholesky factor is the way to go. I'm not sure what you mean by rank 1 or rank 2 updates. Is there a motivation for manipulating the Cholesky factor rather than just generating new ones? The tricky numerical situations often arise with bad conditioning (high ratio of largest to smallest eigenvalue). But I'm not sure how to generate a test matrix with a given condition. We might be able to do it approximately in Stan with a prior on the eigenvalues. Or is that what the low rank updates provide?
|
But did I interpret you correctly as saying it would be good to somehow characterize the precision before adding them? Or can we wait on that as well? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oops, forgot to submit. Just one code comment and then I'm wondering if there's anything quick&easy we could do in the test to ensure the GPU version is being called... any ideas?
@bgoodri do you know how to do what Bob is suggesting (basically generating good test matrices that are tricky for decomposition)? |
We could set |
I'm thinking more to verify that the threshold is working correctly... like could we check the opencl context to see that it has had some matrices pass through it or something? any other little side effects like that? |
…stable/2017-11-14)
I'll have to check, @rok-cesnovar do you know? I think since we don't track events or do profiling In the OpenCL Spec there's a lot of I'm not totally understanding the reason to test that tho'? Like isn't that checking if the if statement goes off? |
I cant think of a simple way of doing that for the primitve cholesky without using events that we are currently not using. For the rev cholesky we could possibly check the allocated memory in the arena, idk? |
Yeah, basically checking that these tests are actually running on the GPU code we think it's running on. You could imagine someone coming along in the future and changing the templating, overloads, if statement, or tuning params slightly and losing that accidentally while the test will still tell you the cholesky is good to go... Hmm, there might just not be a good way to do it right now, so never mind about that. Also @rok-cesnovar I missed that that version was in-place, woops. Makes sense. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this looks good! I think before releasing to the world (i.e. telling people to turn on STAN_OPENCL) we just need to somehow be able to comment on its precision...
Ah, yeah Sean I agree that's a very good point! I'm going to iterate on the Async stuff this weekend and with that we will be able to use the events and profiling to check out what is happening and what went off. |
Also pretty darn cool! I'm going to make an issue tonight with a checklist of things that we want to happen before we make a full announcement about this |
Also gosh, huge thanks to @rok-cesnovar and @seantalts, pretty cool this is finally kicking! |
Thank you for driving this forward relentlessly :) super excited at the progress and looking forward to the announcement :) |
@SteveBronder @seantalts I also want to thank you both for the hard work. Excited :) |
Summary
Last PR!
When users define
STAN_OPENCL
during compilation the reverse mode cholesky will take place on the GPU.Kinda nice, using all the stuff we built before the code here is pretty similar to the code for the Eigen version of the derivative in
cholesky_block
Tests
If you have clinfo installed, you can use
clinfo -l
to see your available platforms and device index values. With only one platform and device the below should work, otherwise place the appropriate index values for your device and platform below.Side Effects
Moves Cholesky derivative from CPU to GPU
Checklist
Math issue Rev for Cholesky on GPU #1117
Copyright holder: (fill in copyright holder information)
the basic tests are passing
./runTests.py test/unit
)make test-headers
)make doxygen
)make cpplint
)the code is written in idiomatic C++ and changes are documented in the doxygen
the new changes are tested