Efficient value_of and value_of_rec for matrices #250

randommm · 2016-03-03T15:33:26Z

Summary:

Fix issue #249

Intended Effect:

Write value_of and value_of_rec for matrix in a more efficient way.

value_of and value_of_rec for matrix are written in a inefficient way that calls mat(i, j), mat.rows() and mat.cols() on each iteration of the loop.

Sizes should be stored in a temporary int and the matrix data pointer in a temporary T*.

How to Verify:

Code review.

Side Effects:

None.

Documentation:

Not applicable.

Reviewer Suggestions:

@rtrangucci

bob-carpenter · 2016-03-03T19:26:20Z

stan/math/prim/mat/fun/value_of_rec.hpp

+      for (int i=0; i < S; i++)
+        data_r[i] = value_of_rec(data_m[i]);
+      return result;
+    }


I understand not doing the double indexing --- that's slow. But I think the bit after the declaration/construction of result, we should just do the following unless you can demonstrate that what you did with all the temporaries is faster.

for (int i = 0; i < M.size(); ++i) result(i) = value_of(M(i));

We know the value can't be recursive through Eigen in any of our code, so we don't need value_of_rec.

@bob-carpenter I'm confused about the comment above about value_of_rec.

I can think of a situation where we would have:

Eigen::Matrix<fvar<var>, R, C> x

and want Eigen::Matrix<double, R, C> y where y(i, j) = x(i, j).val_.val()

We use it like that here:

https://github.com/stan-dev/math/blob/develop/stan/math/prim/mat/err/check_pos_semidefinite.hpp

Ah, I see --- the recursion isn't through container structures
(just std::vector), but also down through mixed types. Thanks
for pointing this out.

On Mar 3, 2016, at 3:43 PM, Rob Trangucci notifications@github.com wrote:

In stan/math/prim/mat/fun/value_of_rec.hpp:

Eigen::Matrix<double, R, C> Md(M.rows(), M.cols());

for (int j = 0; j < M.cols(); j++)

for (int i = 0; i < M.rows(); i++)

Md(i, j) = value_of_rec(M(i, j));

return Md;

Eigen::Matrix<double, R, C>

result(M.rows(), M.cols());

double \* data_r = result.data();

const T \* data_m = M.data();

int S = M.size();

for (int i=0; i < S; i++)

data_r[i] = value_of_rec(data_m[i]);

return result;

}

@bob-carpenter I'm confused about the comment above about value_of_rec.

I can think of a situation where we would have:

Eigen::Matrix<fvar, R, C> x

and want y where y(i, j) = x(i, j).val_.val()

—
Reply to this email directly or view it on GitHub.

I understand not doing the double indexing --- that's slow. But I
think the bit after the declaration/construction of result, we should
just do the following unless you can demonstrate that what you did
with all the temporaries is faster.

|for (int i = 0; i < M.size(); ++i) result(i) = value_of(M(i)); |

The following code https://gist.github.com/randommm/4e61fabe8c5438daeebf

runs a kahan sum usingfour different methods:

(warning: requires more than 3GB to run)

(1)
for (size_t i = 0, nRows = xs.rows(), nCols = xs.cols(); i < nCols; ++i)
for (size_t j = 0; j < nRows; ++j)
//do something with matrix(i, j)

(2)
for (size_t i = 0, nRows = xs.rows(), nCols = xs.cols(); i < nRows; ++i)
for (size_t j = 0; j < nCols; ++j)
//do something with matrix(i, j)

(3)
for (size_t i = 0, size = xs.size(); i < size; i++)
//do something with matrix(i)

(4)
T * matrix_data = matrix.data();
for (size_t i = 0, size = xs.size(); i < size; i++)
//do something with matrix_data[i]

Here's the output of how much time each of them took:

$ g++ main.cpp pkg-config eigen3 --cflags --libs
$ ./a.out
start
25
49
25
7

The first run mat(i, j)

$ g++ main.cpp -DEIGEN_NO_DEBUG pkg-config eigen3 --cflags --libs
$ ./a.out
start
13
35
12
7

Also, with -O3, this

for (size_t i = 0; i < xs.rows(); ++i) for (size_t j = 0; j < xs.cols(); ++j)

doesn't seem to be slower than this:

for (size_t i = 0, nRows = xs.rows(), nCols = xs.cols(); i <

nRows; ++i)
for (size_t j = 0; j < nCols; ++j)

even when xs is not const.

(Sorry if it sounds rather obvious that the compiler's able to optimize this)

Well, maybe write out the time taken in more precision or run more loops in the micro-benchmark. It's hard to see what's going on when they all print out zero!

Eigen (and Stan) leans very very heavily on having a good optimizing compiler to inline function calls statically.

This is a bit off topic for this issue, but it's never obvious what a C++ compiler's going to do!

On 03/03/2016 20:35, Bob Carpenter wrote:

In stan/math/prim/mat/fun/value_of_rec.hpp
#250 (comment):

Eigen::Matrix<double, R, C> Md(M.rows(), M.cols());

for (int j = 0; j < M.cols(); j++)

for (int i = 0; i < M.rows(); i++)

Md(i, j) = value_of_rec(M(i, j));

return Md;

Eigen::Matrix<double, R, C>

result(M.rows(), M.cols());

double \* data_r = result.data();

const T \* data_m = M.data();

int S = M.size();

for (int i=0; i < S; i++)

data_r[i] = value_of_rec(data_m[i]);

return result;

}

Well, maybe write out the time taken in more precision

Do you know how to do this with std::cout and std::time??

cout << time(0) - now << endl;

(I don't think it'll be necessary thought, see below)

or run more loops in the micro-benchmark.

If I try to double the number of elements in the matrix, I get
std::bad_alloc.

But anyway, the 0's were coming up because the compiler optim realized
that I wasn't using the output of the function at all, so it wasn't even
running it!

So I changed;

now = time(0); sum_kahan1t(test); cout << time(0) - now << endl;

to:

now = time(0); cout << sum_kahan1t(test) << endl; cout << time(0) - now << endl;

And after running this a lot of times, I don't think there's any
difference between them, and even if there is, it's probably negligible
(iff -DEIGEN_NO_DEBUG and -O3 are used together).

$ g++ main.cpp -DEIGEN_NO_DEBUG pkg-config eigen3 --cflags --libs -O3
$ ./a.out
start
7697.42
3
7697.42
9
7697.42
4
7697.42
3

$ ./a.out
start
7697.42
3
7697.42
9
7697.42
3
7697.42
4

10 times run summed up:

double te = 0; now = time(0); for (int i = 0; i < 10; i++) te += sum_kahan1t(test); cout << time(0) - now << endl; now = time(0); for (int i = 0; i < 10; i++) te += sum_kahan2t(test); cout << time(0) - now << endl; now = time(0); for (int i = 0; i < 10; i++) te += sum_kahan3t(test); cout << time(0) - now << endl; now = time(0); for (int i = 0; i < 10; i++) te += sum_kahan4t(test); cout << time(0) - now << endl; cout << te << endl;

$ g++ main.cpp -DEIGEN_NO_DEBUG pkg-config eigen3 --cflags --libs -O3
$ ./a.out
start
33
86
32
33
307897

Thanks. That's what we wanted to see. The only thing I wouldn't have expected was the double-indexing to be just as fast as single indexing, but that must just be getting parallelized or at least not taking up measurable amounts of time compared to the floating point operations.

rtrangucci · 2016-03-04T19:12:59Z

@randommm based on all the results above, shall we close this PR?

bob-carpenter · 2016-03-04T22:17:03Z

We still want the direct return on the double instantiations.
Don't know why that wasn't there to begin with.

Bob

On Mar 4, 2016, at 2:13 PM, Rob Trangucci notifications@github.com wrote:

@randommm based on all the results above, shall we close this PR?

—
Reply to this email directly or view it on GitHub.

bob-carpenter · 2016-03-21T18:45:32Z

This is all ready to go. @sycklik, is this OK to merge (I'm a bit shy to merge things given all the ramifications upstream on Stan).

randommm added 2 commits March 2, 2016 19:28

Efficient value_of and value_of_rec for matrices

ec83445

added pass-through version of value_of_rec

43c739c

bob-carpenter reviewed Mar 3, 2016
View reviewed changes

revert changes except for included pass-through versions

7a8378e

bob-carpenter added the code reviewed label Mar 21, 2016

syclik merged commit 2967b52 into develop Mar 28, 2016

syclik deleted the feature/issue-249-efficient-mat-value_of branch March 28, 2016 15:57

syclik modified the milestone: v2.11.0 Jul 27, 2016

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Efficient value_of and value_of_rec for matrices #250

Efficient value_of and value_of_rec for matrices #250

randommm commented Mar 3, 2016

bob-carpenter Mar 3, 2016

rtrangucci Mar 3, 2016

rtrangucci Mar 3, 2016

bob-carpenter Mar 3, 2016

randommm Mar 3, 2016

randommm Mar 3, 2016

bob-carpenter Mar 3, 2016

bob-carpenter Mar 3, 2016

randommm Mar 4, 2016

bob-carpenter Mar 4, 2016

rtrangucci commented Mar 4, 2016

bob-carpenter commented Mar 4, 2016

bob-carpenter commented Mar 21, 2016

Efficient value_of and value_of_rec for matrices #250

Efficient value_of and value_of_rec for matrices #250

Conversation

randommm commented Mar 3, 2016

Summary:

Intended Effect:

How to Verify:

Side Effects:

Documentation:

Reviewer Suggestions:

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

rtrangucci commented Mar 4, 2016

bob-carpenter commented Mar 4, 2016

bob-carpenter commented Mar 21, 2016