-
-
Notifications
You must be signed in to change notification settings - Fork 183
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
optimization #2551
optimization #2551
Conversation
…4.1 (tags/RELEASE_600/final)
Jenkins Console Log Machine informationProductName: Mac OS X ProductVersion: 10.11.6 BuildVersion: 15G22010CPU: G++: Clang: |
index_apply<sizeof...(T_expressions)>([this, &exprs](auto... Is) mutable { | ||
assignment_impl(std::tuple_cat(make_assignment_pair( | ||
std::get<Is>(results_), | ||
std::forward<T_expressions>(std::get<Is>(exprs.expressions_)))...)); | ||
}); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If the input is && can't you always move instead of forward? And I think this can be made a lot simpler with for_each()
index_apply<sizeof...(T_expressions)>([this, &exprs](auto... Is) mutable { | |
assignment_impl(std::tuple_cat(make_assignment_pair( | |
std::get<Is>(results_), | |
std::forward<T_expressions>(std::get<Is>(exprs.expressions_)))...)); | |
}); | |
for_each([](auto& result, auto&& expr) mutable { | |
assignment_impl(std::tuple_cat(make_assignment_pair(result, std::move(expr.expressions_)))); | |
}, this->results_, std::move(exprs)); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also can make assignment pair just return back an std::pair?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
some make_assignment_pair overloads need to return nothing, which can not be achieved with std::pair. Right now they return an empty tuple.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If the input is && can't you always move instead of forward?
No. The tuple itself might be a rvalue reference, but the elements in it can still be lvalue references, which must not be moved.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I tried to implement this with for_each()
, but I can't expand the T_expressions
parameter pack, so I can not call the std::forward
. This will need to stay as it is (also in this case for_each()
would not really simplify the code).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
But for that pair, since the rhs is an rvalue doesn't it use the make_assignment_pair()
that doesn't call a kernel that you wrote in this PR here? If that test should throw we should check that the result and matrix have the same size before doing the move
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You are right about this not being implemented within a kernel. The test, however operates on lvalues, so the new make_assignment_pair()
from this PR is not used and kernel would be used if both operations had the same size. As they do not, it correctly throws.
I agree it not exactly nice to have different error handling behavior between l- and rvalues. However, changing the new (rvalue) case to also throw would be a bit hard, as this check would have to be made before the make_assignment_pair()
call. That would be a bit more work than I am willing to go into for this PR. Anyway, as it is now in this PR, it works correctly, even if a bit unintuitively.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The test, however operates on lvalues, so the new make_assignment_pair() from this PR is not used and kernel would be used if both operations had the same size. As they do not, it correctly throws.
m1_cl - m2_cl
in the test we are talking about is an rvalue though and we are assigning it to a matrix_cl<>
that is uninitialized.
EXPECT_THROW(stan::math::results(res1_cl, res2_cl)
= stan::math::expressions(m1_cl - m2_cl, m3_cl - m4_cl),
std::invalid_argument);
Do we want to always be throwing in the case of incorrect size assignment here? In that case I think it would just be changing the matrix_cl
rvalue assign operator to check sizes before doing anything. But imo it feels like it makes sense to allow this since the rhs is replacing the buffer of the matrix_cl.
The other test where we have an expression of a certain size on the lhs still throws
// mismatch in size between an expression and a result that can not be resized
auto block1 = stan::math::block_zero_based(m1_cl, 0, 0, 3, 3);
EXPECT_THROW(stan::math::results(res1_cl, block1)
= stan::math::expressions(m3_cl + m4_cl, m3_cl - m4_cl),
std::invalid_argument);
I agree it not exactly nice to have different error handling behavior between l- and rvalues. However, changing the new (rvalue) case to also throw would be a bit hard, as this check would have to be made before the make_assignment_pair() call.
Is there a reason we can't just check the rows and columns of the assigner and assignee and if they mismatch throw an error?
What's the intuition behind not wanting to allow resizing in the second test case? tmk Eigen only stops resizing in the case of an Eigen Map or fixed size matrices of mismatched sizes.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
m1_cl - m2_cl in the test we are talking about is an rvalue though and we are assigning it to a matrix_cl<> that is uninitialized.
Oh, right, the reason for not using the new code in that test is the fact that the right hand side expression is not a plain matrix. Reminder of my previous post is still true.
Do we want to always be throwing in the case of incorrect size assignment here?
That is debatable. If we have two assignments of different sizes we can not do them in the same kernel so we throw. If one of these can be handled by matrix_cl move assignment (the new code in this PR) than we don't need to. Although we still could to be consistent with throwing when right hand side is a rvalue and we can not use move assignment.
Is there a reason we can't just check the rows and columns of the assigner and assignee and if they mismatch throw an error?
We already do that and that is not the issue here. The issue is if we want to assign two expressions in the same call and they have different size (the sizes of results match expressions or can be resized so they do).
What's the intuition behind not wanting to allow resizing in the second test case?
It is not about resizing. See my previous answer.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
m1_cl - m2_cl in the test we are talking about is an rvalue though and we are assigning it to a matrix_cl<> that is uninitialized.
Oh, right, the reason for not using the new code in that test is the fact that the right hand side expression is not a plain matrix. Reminder of my previous post is still true.
Sorry which post? And if what I wrote should not be using the new make_assignment_pair()
can you add a require to guard against that? Though tbh I'm still confused why it can't use the new one with the move assignment since we are assigning an rvalue expression to the res1/2_cl
Is there a reason we can't just check the rows and columns of the assigner and assignee and if they mismatch throw an error?
We already do that and that is not the issue here. The issue is if we want to assign two expressions in the same call and they have different size (the sizes of results match expressions or can be resized so they do).
It sounds like we should have another test that is something like
auto m1_m2_tmp = m1_cl - m2_cl;
auto m3_m4_tmp = m1_cl - m2_cl;
EXPECT_THROW(stan::math::results(res1_cl, res2_cl)
= stan::math::expressions(m1_m2_tmp, m2_m4_tmp),
std::invalid_argument);
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Mostly doc questions and the for_each thing
index_apply<sizeof...(T_expressions)>([this, &exprs](auto... Is) mutable { | ||
assignment_impl(std::tuple_cat(make_assignment_pair( | ||
std::get<Is>(results_), | ||
std::forward<T_expressions>(std::get<Is>(exprs.expressions_)))...)); | ||
}); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So I got this to work with the below, I think it's doing the same thing? Are the T_expressions
in expressions_cl
the same value type in the underlying tuple?
stan::math::for_each([](auto&& result, auto&& expr) mutable {
using expr_t = decltype(expr);
assignment_impl(std::tuple_cat(make_assignment_pair(
result,
std::forward<expr_t>(expr))));
}, results_, std::forward<decltype(exprs.expressions_)>(exprs.expressions_));
This does throw an error for one of the tests in multi_result_kernel
matrix_cl<double> m1_cl(3, 3);
matrix_cl<double> m2_cl(3, 3);
matrix_cl<double> m3_cl(4, 3);
matrix_cl<double> m4_cl(4, 3);
matrix_cl<double> res1_cl;
matrix_cl<double> res2_cl;
// mismatch in size between different (expression, result) pairs
/**
* This currently throws but with my above it does not
*/
EXPECT_THROW(stan::math::results(res1_cl, res2_cl)
= stan::math::expressions(m1_cl - m2_cl, m3_cl - m4_cl),
std::invalid_argument);
Should that test throw though? The result matrix is not filled so we move the resulting rhs into res*_cl
so I think that's fine?
index_apply<sizeof...(T_expressions)>([this, &exprs](auto... Is) mutable { | ||
assignment_impl(std::tuple_cat(make_assignment_pair( | ||
std::get<Is>(results_), | ||
std::forward<T_expressions>(std::get<Is>(exprs.expressions_)))...)); | ||
}); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If your good with the above I can open up a PR to change the other ones that use index_apply.
std::get<Is>(results_), | ||
std::forward<T_expressions>(std::get<Is>(exprs.expressions_)))...)); | ||
}); | ||
} | ||
|
||
/** | ||
* Incrementing \c results_ object by \c expressions_cl object |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Missing docs for the templates. Those are pretty complicated requires so please write out what's going on there
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Template is doced on line 339. The require is also relatively simple - just the number of results and expressions must match.
conjunction< | ||
is_matrix_cl<T_result>, is_matrix_cl<T_expression>, | ||
std::is_same<value_type_t<T_expression>, value_type_t<T_result>>, | ||
std::is_rvalue_reference<T_expression&&>>>* = nullptr> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If this is strictly for rvalue references why are we forwarding in the below? And why does this one need to be an rvalue reference?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There are two things that can be r- or l-value references. One is tuple (which in this case must be a rvalue reference and than there are tuple elements that can be either.
|
||
/** | ||
* Optimized move assignment of a `matrix_cl` into another `matrix_cl`. | ||
* @param result result - check |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What does result - check
mean?
As much as I'd prefer not to lose good work, there were outstanding comments that weren't addressed, so I'm closing this. |
Summary
Optimize kernel generator so it can use
matrix_cl
's move assignment where possible instead of copying the data.Tests
Added a test to check that the new optimization is used.
Side Effects
None.
Release notes
OpenCL: Optimized kernel generator so it can use
matrix_cl
's move assignment where possible instead of copying the data.Checklist
Math issue #(issue number)
Copyright holder: Tadej Ciglarič
The copyright holder is typically you or your assignee, such as a university or company. By submitting this pull request, the copyright holder is agreeing to the license the submitted work under the following licenses:
- Code: BSD 3-clause (https://opensource.org/licenses/BSD-3-Clause)
- Documentation: CC-BY 4.0 (https://creativecommons.org/licenses/by/4.0/)
the basic tests are passing
./runTests.py test/unit
)make test-headers
)make test-math-dependencies
)make doxygen
)make cpplint
)the code is written in idiomatic C++ and changes are documented in the doxygen
the new changes are tested