Rev for Cholesky on GPU #1118

SteveBronder · 2019-02-12T03:44:31Z

Summary

Last PR!

When users define STAN_OPENCL during compilation the reverse mode cholesky will take place on the GPU.

Kinda nice, using all the stuff we built before the code here is pretty similar to the code for the Eigen version of the derivative in cholesky_block

Tests

If you have clinfo installed, you can use clinfo -l to see your available platforms and device index values. With only one platform and device the below should work, otherwise place the appropriate index values for your device and platform below.

echo STAN_OPENCL=true>> make/local
echo OPENCL_PLATFORM_ID=0>> make/local
echo OPENCL_DEVICE_ID=0>> make/local
./runTests.py ./runTests.py test/unit/math/rev/mat/fun/cholesky_decompose_test.cpp

Side Effects

Moves Cholesky derivative from CPU to GPU

Checklist

Math issue Rev for Cholesky on GPU #1117
Copyright holder: (fill in copyright holder information)

Rok Češnovar and Erik Štrumbelj (Faculty of Computer and Information Science, University of Ljubljana)
Steve Bronder

the basic tests are passing
- unit tests pass (to run, use: ./runTests.py test/unit)
- header checks pass, (make test-headers)
- docs build, (make doxygen)
- code passes the built in C++ standards checks (make cpplint)
the code is written in idiomatic C++ and changes are documented in the doxygen
the new changes are tested

…into gpu-cholesky-rev

# Conflicts: # stan/math/prim/mat/fun/cholesky_decompose.hpp

# Conflicts: # stan/math/gpu/multiply.hpp

…esky-rev

…gs/RELEASE_500/final)

…into gpu-cholesky-rev

test/unit/math/rev/mat/fun/cholesky_decompose_test.cpp

…gs/RELEASE_500/final)

rok-cesnovar · 2019-02-27T18:02:37Z

@seantalts I believe I addressed all your comments.

Regarding precision: my plan was to compare fits with STAN_OPENCL and without for either this example (example.txt) from the Stan manual or the model we used for StanCon ?
Any thoughts on this? I could run this with different values for the primitive cholesky_decompose parameters.

…into gpu-cholesky-rev

seantalts · 2019-02-28T17:24:37Z

Re: Precision. Hmm. I talked with Bob a little about this, and I don't think there's a clear way to measure precision as it kind of depends on the matrix sizes and contents thereof. There are many possible paths, and I think we should just pick one or two and be clear about the assumptions and what we tested (and code is usually pretty clear). I'll propose doing two things:

SBC on the model from the manual to see if we can spot any issues with fitting. Ben is adding SBC support to RStan and it's almost in, see Feature/sbc rstan#611. Let's run something like 100 replications of 10 posterior draws each and see what we can see at first.
@bgoodri's idea for a Cholesky decomposition unit test:

https://en.wikipedia.org/wiki/Cholesky_decomposition#Updating_the_decomposition
where we start with a Cholesky factor of a big covariance matrix and then imagine rank-1 or rank-2 updates to the covariance matrix. That can be done more-or-less exactly by direct manipulation of the original Cholesky factor that we can compare to the naive way of multiplying the Cholesky factor by its own transpose, adding the updates, and taking the Cholesky factor of the result.

We can add a test for the relative error for this procedure on GPU and our CPU versions to the source code (see this for measuring relative error). Neither of these have to be done before this PR goes in, but we should do them before we release GPUs to the world, and then we can add a little section describing GPU precision to the manual section on GPU operations.

bob-carpenter · 2019-03-01T16:09:39Z

My point was that whatever the precision, it probably shouldn't block adding GPUs. I think testing with a known Cholesky factor is the way to go. I'm not sure what you mean by rank 1 or rank 2 updates. Is there a motivation for manipulating the Cholesky factor rather than just generating new ones? The tricky numerical situations often arise with bad conditioning (high ratio of largest to smallest eigenvalue). But I'm not sure how to generate a test matrix with a given condition. We might be able to do it approximately in Stan with a prior on the eigenvalues. Or is that what the low rank updates provide?

seantalts · 2019-03-01T17:15:38Z

My point was that whatever the precision, it probably shouldn't block adding GPUs.

But did I interpret you correctly as saying it would be good to somehow characterize the precision before adding them? Or can we wait on that as well?

seantalts

Oops, forgot to submit. Just one code comment and then I'm wondering if there's anything quick&easy we could do in the test to ensure the GPU version is being called... any ideas?

stan/math/rev/mat/fun/cholesky_decompose.hpp

seantalts · 2019-03-01T17:22:41Z

@bgoodri do you know how to do what Bob is suggesting (basically generating good test matrices that are tricky for decomposition)?

SteveBronder · 2019-03-01T17:26:15Z

I'm wondering if there's anything quick&easy we could do in the test to ensure the GPU version is being called... any ideas?

We could set opencl_context.tuning_opts().cholesky_size_worth_transfer = 0; to have that condition always go off right? Or is there somethign else you are worried about that would stop the test from happening

seantalts · 2019-03-01T17:37:10Z

I'm thinking more to verify that the threshold is working correctly... like could we check the opencl context to see that it has had some matrices pass through it or something? any other little side effects like that?

…stable/2017-11-14)

SteveBronder · 2019-03-01T18:32:44Z

I'll have to check, @rok-cesnovar do you know? I think since we don't track events or do profiling
there's nothing specific in the queue to tell us that info.

In the OpenCL Spec there's a lot of getInfo() functions so we may be able to get some stuff from there

I'm not totally understanding the reason to test that tho'? Like isn't that checking if the if statement goes off?

rok-cesnovar · 2019-03-01T19:22:13Z

I cant think of a simple way of doing that for the primitve cholesky without using events that we are currently not using. For the rev cholesky we could possibly check the allocated memory in the arena, idk?

seantalts · 2019-03-02T00:25:10Z

Yeah, basically checking that these tests are actually running on the GPU code we think it's running on. You could imagine someone coming along in the future and changing the templating, overloads, if statement, or tuning params slightly and losing that accidentally while the test will still tell you the cholesky is good to go... Hmm, there might just not be a good way to do it right now, so never mind about that.

Also @rok-cesnovar I missed that that version was in-place, woops. Makes sense.

seantalts

I think this looks good! I think before releasing to the world (i.e. telling people to turn on STAN_OPENCL) we just need to somehow be able to comment on its precision...

SteveBronder · 2019-03-02T00:49:33Z

Ah, yeah Sean I agree that's a very good point! I'm going to iterate on the Async stuff this weekend and with that we will be able to use the events and profiling to check out what is happening and what went off.

SteveBronder · 2019-03-02T00:51:11Z

Also pretty darn cool! I'm going to make an issue tonight with a checklist of things that we want to happen before we make a full announcement about this

SteveBronder · 2019-03-02T00:52:40Z

Also gosh, huge thanks to @rok-cesnovar and @seantalts, pretty cool this is finally kicking!

seantalts · 2019-03-02T09:46:31Z

Thank you for driving this forward relentlessly :) super excited at the progress and looking forward to the announcement :)

rok-cesnovar · 2019-03-02T13:04:01Z

@SteveBronder @seantalts I also want to thank you both for the hard work. Excited :)

SteveBronder and others added 22 commits December 19, 2018 22:35

Moved over main rev function to branch and did simple cleaning

15be4dd

Moves Cholesky GPU into the cholesky file

0196022

Merge branch 'gpu_cholesky_prim' into gpu-cholesky-rev

ff32352

Merge branch 'gpu-cholesky-rev' of https://github.com/bstatcomp/math …

463fb32

…into gpu-cholesky-rev

Update rev cholosky to use new gpu functions

4f156f4

Fixup rev code

37d5b18

update rev

0d88984

reverting the copy_tri kernel changes

ac03052

include header remove

a77afda

Merge branch 'gpu_cholesky_prim' into gpu-cholesky-rev

66799ba

minor comments

b87dffa

Merge branch 'gpu_cholesky_prim' into gpu-cholesky-rev

37075df

added rev opencl tests, failing

2718f05

Merge branch 'gpu_cholesky_prim' into gpu-cholesky-rev

af7f16a

# Conflicts: # stan/math/prim/mat/fun/cholesky_decompose.hpp

further work on rev, still fails

69e7d9c

Update tests, now passing

5f0cc21

missing ifdef in rev, caused fails without STAN_OPENCL

1e27c13

fixed the multiply(Nx0, 0xM) bug

6105b19

removed the size>0 check

0363c08

Merge branch 'gpu_cholesky_prim' into gpu-cholesky-rev

c56c5d7

# Conflicts: # stan/math/gpu/multiply.hpp

sets the tests to go off on the GPU for smaller cholesky sizes

cd65c9b

Merge remote-tracking branch 'origin/gpu_cholesky_prim' into gpu-chol…

eb33397

…esky-rev

SteveBronder closed this Feb 12, 2019

SteveBronder reopened this Feb 12, 2019

yashikno and others added 4 commits February 12, 2019 13:00

Merge commit 'a933f65c4f584fb2ff949b350816eae59b72ddfd' into HEAD

b95beba

[Jenkins] auto-formatting by clang-format version 5.0.0-3~16.04.1 (ta…

0bf89b4

…gs/RELEASE_500/final)

Merge remote-tracking branch 'upstream/develop' into gpu-cholesky-rev

a3c1650

Merge branch 'gpu-cholesky-rev' of https://github.com/bstatcomp/math …

a82d581

…into gpu-cholesky-rev

SteveBronder commented Feb 12, 2019

View reviewed changes

test/unit/math/rev/mat/fun/cholesky_decompose_test.cpp Show resolved Hide resolved

Accidentally took out some tests that were supposed to be there

96b2a40

SteveBronder and others added 7 commits February 26, 2019 20:30

Merge remote-tracking branch 'upstream/develop' into gpu-cholesky-rev

1e09d95

removing camelCase and unwanted underscores

9037b8d

added 2 tests below the threshold

5c2e48d

bar goes adj

c45bac0

symbolic_rev added, better zeroing

5117236

Merge commit 'c72d56f4d29d2cc5d3535ee433d7a9695fb681fc' into HEAD

1d17106

[Jenkins] auto-formatting by clang-format version 5.0.0-3~16.04.1 (ta…

d4c8646

…gs/RELEASE_500/final)

Merge branch 'gpu-cholesky-rev' of https://github.com/bstatcomp/math …

d6d7fa0

…into gpu-cholesky-rev

Merge remote-tracking branch 'upstream/develop' into gpu-cholesky-rev

d0c124f

seantalts suggested changes Mar 1, 2019

View reviewed changes

stan/math/rev/mat/fun/cholesky_decompose.hpp Outdated Show resolved Hide resolved

outsource to prim cholesky directly

9784cc2

[Jenkins] auto-formatting by clang-format version 6.0.0 (tags/google/…

0db6764

…stable/2017-11-14)

seantalts approved these changes Mar 2, 2019

View reviewed changes

SteveBronder merged commit 0c89499 into stan-dev:develop Mar 2, 2019

SteveBronder deleted the gpu-cholesky-rev branch May 22, 2019 03:44

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Rev for Cholesky on GPU #1118

Rev for Cholesky on GPU #1118

SteveBronder commented Feb 12, 2019 •

edited

Loading

rok-cesnovar commented Feb 27, 2019

seantalts commented Feb 28, 2019

bob-carpenter commented Mar 1, 2019 via email

seantalts commented Mar 1, 2019

seantalts left a comment

seantalts commented Mar 1, 2019

SteveBronder commented Mar 1, 2019

seantalts commented Mar 1, 2019

SteveBronder commented Mar 1, 2019

rok-cesnovar commented Mar 1, 2019

seantalts commented Mar 2, 2019

seantalts left a comment

SteveBronder commented Mar 2, 2019

SteveBronder commented Mar 2, 2019

SteveBronder commented Mar 2, 2019

seantalts commented Mar 2, 2019

rok-cesnovar commented Mar 2, 2019

Rev for Cholesky on GPU #1118

Rev for Cholesky on GPU #1118

Conversation

SteveBronder commented Feb 12, 2019 • edited Loading

Summary

Tests

Side Effects

Checklist

rok-cesnovar commented Feb 27, 2019

seantalts commented Feb 28, 2019

bob-carpenter commented Mar 1, 2019 via email

seantalts commented Mar 1, 2019

seantalts left a comment

Choose a reason for hiding this comment

seantalts commented Mar 1, 2019

SteveBronder commented Mar 1, 2019

seantalts commented Mar 1, 2019

SteveBronder commented Mar 1, 2019

rok-cesnovar commented Mar 1, 2019

seantalts commented Mar 2, 2019

seantalts left a comment

Choose a reason for hiding this comment

SteveBronder commented Mar 2, 2019

SteveBronder commented Mar 2, 2019

SteveBronder commented Mar 2, 2019

seantalts commented Mar 2, 2019

rok-cesnovar commented Mar 2, 2019

SteveBronder commented Feb 12, 2019 •

edited

Loading