-
-
Notifications
You must be signed in to change notification settings - Fork 186
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Efficient value_of and value_of_rec for matrices #250
Conversation
for (int i=0; i < S; i++) | ||
data_r[i] = value_of_rec(data_m[i]); | ||
return result; | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I understand not doing the double indexing --- that's slow. But I think the bit after the declaration/construction of result, we should just do the following unless you can demonstrate that what you did with all the temporaries is faster.
for (int i = 0; i < M.size(); ++i)
result(i) = value_of(M(i));
We know the value can't be recursive through Eigen in any of our code, so we don't need value_of_rec
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@bob-carpenter I'm confused about the comment above about value_of_rec.
I can think of a situation where we would have:
Eigen::Matrix<fvar<var>, R, C> x
and want Eigen::Matrix<double, R, C> y
where y(i, j) = x(i, j).val_.val()
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We use it like that here:
https://github.com/stan-dev/math/blob/develop/stan/math/prim/mat/err/check_pos_semidefinite.hpp
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah, I see --- the recursion isn't through container structures
(just std::vector), but also down through mixed types. Thanks
for pointing this out.
On Mar 3, 2016, at 3:43 PM, Rob Trangucci notifications@github.com wrote:
In stan/math/prim/mat/fun/value_of_rec.hpp:
Eigen::Matrix<double, R, C> Md(M.rows(), M.cols());
for (int j = 0; j < M.cols(); j++)
for (int i = 0; i < M.rows(); i++)
Md(i, j) = value_of_rec(M(i, j));
return Md;
Eigen::Matrix<double, R, C>
result(M.rows(), M.cols());
double \* data_r = result.data();
const T \* data_m = M.data();
int S = M.size();
for (int i=0; i < S; i++)
data_r[i] = value_of_rec(data_m[i]);
return result;
- }
@bob-carpenter I'm confused about the comment above about value_of_rec.
I can think of a situation where we would have:
Eigen::Matrix<fvar, R, C> x
and want y where y(i, j) = x(i, j).val_.val()
—
Reply to this email directly or view it on GitHub.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I understand not doing the double indexing --- that's slow. But I
think the bit after the declaration/construction of result, we should
just do the following unless you can demonstrate that what you did
with all the temporaries is faster.|for (int i = 0; i < M.size(); ++i) result(i) = value_of(M(i)); |
The following code https://gist.github.com/randommm/4e61fabe8c5438daeebf
runs a kahan sum usingfour different methods:
(warning: requires more than 3GB to run)
(1)
for (size_t i = 0, nRows = xs.rows(), nCols = xs.cols(); i < nCols; ++i)
for (size_t j = 0; j < nRows; ++j)
//do something with matrix(i, j)
(2)
for (size_t i = 0, nRows = xs.rows(), nCols = xs.cols(); i < nRows; ++i)
for (size_t j = 0; j < nCols; ++j)
//do something with matrix(i, j)
(3)
for (size_t i = 0, size = xs.size(); i < size; i++)
//do something with matrix(i)
(4)
T * matrix_data = matrix.data();
for (size_t i = 0, size = xs.size(); i < size; i++)
//do something with matrix_data[i]
Here's the output of how much time each of them took:
$ g++ main.cpp pkg-config eigen3 --cflags --libs
$ ./a.out
start
25
49
25
7
The first run mat(i, j)
$ g++ main.cpp -DEIGEN_NO_DEBUG pkg-config eigen3 --cflags --libs
$ ./a.out
start
13
35
12
7
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also, with -O3, this
for (size_t i = 0; i < xs.rows(); ++i)
for (size_t j = 0; j < xs.cols(); ++j)
doesn't seem to be slower than this:
for (size_t i = 0, nRows = xs.rows(), nCols = xs.cols(); i <
nRows; ++i)
for (size_t j = 0; j < nCols; ++j)
even when xs is not const.
(Sorry if it sounds rather obvious that the compiler's able to optimize this)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Well, maybe write out the time taken in more precision or run more loops in the micro-benchmark. It's hard to see what's going on when they all print out zero!
Eigen (and Stan) leans very very heavily on having a good optimizing compiler to inline function calls statically.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is a bit off topic for this issue, but it's never obvious what a C++ compiler's going to do!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
On 03/03/2016 20:35, Bob Carpenter wrote:
In stan/math/prim/mat/fun/value_of_rec.hpp
#250 (comment):
Eigen::Matrix<double, R, C> Md(M.rows(), M.cols());
for (int j = 0; j < M.cols(); j++)
for (int i = 0; i < M.rows(); i++)
Md(i, j) = value_of_rec(M(i, j));
return Md;
Eigen::Matrix<double, R, C>
result(M.rows(), M.cols());
double \* data_r = result.data();
const T \* data_m = M.data();
int S = M.size();
for (int i=0; i < S; i++)
data_r[i] = value_of_rec(data_m[i]);
return result;
- }
Well, maybe write out the time taken in more precision
Do you know how to do this with std::cout and std::time??
cout << time(0) - now << endl;
(I don't think it'll be necessary thought, see below)
or run more loops in the micro-benchmark.
If I try to double the number of elements in the matrix, I get
std::bad_alloc.
But anyway, the 0's were coming up because the compiler optim realized
that I wasn't using the output of the function at all, so it wasn't even
running it!
So I changed;
now = time(0);
sum_kahan1t(test);
cout << time(0) - now << endl;
to:
now = time(0);
cout << sum_kahan1t(test) << endl;
cout << time(0) - now << endl;
And after running this a lot of times, I don't think there's any
difference between them, and even if there is, it's probably negligible
(iff -DEIGEN_NO_DEBUG and -O3 are used together).
$ g++ main.cpp -DEIGEN_NO_DEBUG pkg-config eigen3 --cflags --libs
-O3
$ ./a.out
start
7697.42
3
7697.42
9
7697.42
4
7697.42
3
$ ./a.out
start
7697.42
3
7697.42
9
7697.42
3
7697.42
4
10 times run summed up:
double te = 0;
now = time(0);
for (int i = 0; i < 10; i++)
te += sum_kahan1t(test);
cout << time(0) - now << endl;
now = time(0);
for (int i = 0; i < 10; i++)
te += sum_kahan2t(test);
cout << time(0) - now << endl;
now = time(0);
for (int i = 0; i < 10; i++)
te += sum_kahan3t(test);
cout << time(0) - now << endl;
now = time(0);
for (int i = 0; i < 10; i++)
te += sum_kahan4t(test);
cout << time(0) - now << endl;
cout << te << endl;
$ g++ main.cpp -DEIGEN_NO_DEBUG pkg-config eigen3 --cflags --libs
-O3
$ ./a.out
start
33
86
32
33
307897
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks. That's what we wanted to see. The only thing I wouldn't have expected was the double-indexing to be just as fast as single indexing, but that must just be getting parallelized or at least not taking up measurable amounts of time compared to the floating point operations.
@randommm based on all the results above, shall we close this PR? |
We still want the direct return on the double instantiations.
|
This is all ready to go. @sycklik, is this OK to merge (I'm a bit shy to merge things given all the ramifications upstream on Stan). |
Summary:
Fix issue #249
Intended Effect:
Write value_of and value_of_rec for matrix in a more efficient way.
value_of
andvalue_of_rec
for matrix are written in a inefficient way that callsmat(i, j)
,mat.rows()
andmat.cols()
on each iteration of the loop.Sizes should be stored in a temporary
int
and the matrix data pointer in a temporaryT*
.How to Verify:
Code review.
Side Effects:
None.
Documentation:
Not applicable.
Reviewer Suggestions:
@rtrangucci