Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

optimization #2551

Closed
wants to merge 2 commits into from
Closed
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
41 changes: 36 additions & 5 deletions stan/math/opencl/kernel_generator/multi_result_kernel.hpp
Original file line number Diff line number Diff line change
Expand Up @@ -321,6 +321,16 @@ class results_cl {
std::get<Is>(results_), std::get<Is>(exprs.expressions_))...));
});
}
template <typename... T_expressions,
typename = std::enable_if_t<sizeof...(T_results)
== sizeof...(T_expressions)>>
void operator=(expressions_cl<T_expressions...>&& exprs) {
index_apply<sizeof...(T_expressions)>([this, &exprs](auto... Is) mutable {
assignment_impl(std::tuple_cat(make_assignment_pair(
std::get<Is>(results_),
std::forward<T_expressions>(std::get<Is>(exprs.expressions_)))...));
});
Comment on lines +328 to +332
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If the input is && can't you always move instead of forward? And I think this can be made a lot simpler with for_each()

Suggested change
index_apply<sizeof...(T_expressions)>([this, &exprs](auto... Is) mutable {
assignment_impl(std::tuple_cat(make_assignment_pair(
std::get<Is>(results_),
std::forward<T_expressions>(std::get<Is>(exprs.expressions_)))...));
});
for_each([](auto& result, auto&& expr) mutable {
assignment_impl(std::tuple_cat(make_assignment_pair(result, std::move(expr.expressions_))));
}, this->results_, std::move(exprs));

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also can make assignment pair just return back an std::pair?

Copy link
Contributor Author

@t4c1 t4c1 Jul 28, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

some make_assignment_pair overloads need to return nothing, which can not be achieved with std::pair. Right now they return an empty tuple.

Copy link
Contributor Author

@t4c1 t4c1 Jul 28, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If the input is && can't you always move instead of forward?

No. The tuple itself might be a rvalue reference, but the elements in it can still be lvalue references, which must not be moved.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I tried to implement this with for_each(), but I can't expand the T_expressions parameter pack, so I can not call the std::forward. This will need to stay as it is (also in this case for_each() would not really simplify the code).

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

But for that pair, since the rhs is an rvalue doesn't it use the make_assignment_pair() that doesn't call a kernel that you wrote in this PR here? If that test should throw we should check that the result and matrix have the same size before doing the move

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You are right about this not being implemented within a kernel. The test, however operates on lvalues, so the new make_assignment_pair() from this PR is not used and kernel would be used if both operations had the same size. As they do not, it correctly throws.

I agree it not exactly nice to have different error handling behavior between l- and rvalues. However, changing the new (rvalue) case to also throw would be a bit hard, as this check would have to be made before the make_assignment_pair() call. That would be a bit more work than I am willing to go into for this PR. Anyway, as it is now in this PR, it works correctly, even if a bit unintuitively.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The test, however operates on lvalues, so the new make_assignment_pair() from this PR is not used and kernel would be used if both operations had the same size. As they do not, it correctly throws.

m1_cl - m2_cl in the test we are talking about is an rvalue though and we are assigning it to a matrix_cl<> that is uninitialized.

  EXPECT_THROW(stan::math::results(res1_cl, res2_cl)
               = stan::math::expressions(m1_cl - m2_cl, m3_cl - m4_cl),
               std::invalid_argument);

Do we want to always be throwing in the case of incorrect size assignment here? In that case I think it would just be changing the matrix_cl rvalue assign operator to check sizes before doing anything. But imo it feels like it makes sense to allow this since the rhs is replacing the buffer of the matrix_cl.

The other test where we have an expression of a certain size on the lhs still throws

  // mismatch in size between an expression and a result that can not be resized
  auto block1 = stan::math::block_zero_based(m1_cl, 0, 0, 3, 3);
  EXPECT_THROW(stan::math::results(res1_cl, block1)
               = stan::math::expressions(m3_cl + m4_cl, m3_cl - m4_cl),
               std::invalid_argument);

I agree it not exactly nice to have different error handling behavior between l- and rvalues. However, changing the new (rvalue) case to also throw would be a bit hard, as this check would have to be made before the make_assignment_pair() call.

Is there a reason we can't just check the rows and columns of the assigner and assignee and if they mismatch throw an error?

What's the intuition behind not wanting to allow resizing in the second test case? tmk Eigen only stops resizing in the case of an Eigen Map or fixed size matrices of mismatched sizes.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

m1_cl - m2_cl in the test we are talking about is an rvalue though and we are assigning it to a matrix_cl<> that is uninitialized.

Oh, right, the reason for not using the new code in that test is the fact that the right hand side expression is not a plain matrix. Reminder of my previous post is still true.

Do we want to always be throwing in the case of incorrect size assignment here?

That is debatable. If we have two assignments of different sizes we can not do them in the same kernel so we throw. If one of these can be handled by matrix_cl move assignment (the new code in this PR) than we don't need to. Although we still could to be consistent with throwing when right hand side is a rvalue and we can not use move assignment.

Is there a reason we can't just check the rows and columns of the assigner and assignee and if they mismatch throw an error?

We already do that and that is not the issue here. The issue is if we want to assign two expressions in the same call and they have different size (the sizes of results match expressions or can be resized so they do).

What's the intuition behind not wanting to allow resizing in the second test case?

It is not about resizing. See my previous answer.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

m1_cl - m2_cl in the test we are talking about is an rvalue though and we are assigning it to a matrix_cl<> that is uninitialized.

Oh, right, the reason for not using the new code in that test is the fact that the right hand side expression is not a plain matrix. Reminder of my previous post is still true.

Sorry which post? And if what I wrote should not be using the new make_assignment_pair() can you add a require to guard against that? Though tbh I'm still confused why it can't use the new one with the move assignment since we are assigning an rvalue expression to the res1/2_cl

Is there a reason we can't just check the rows and columns of the assigner and assignee and if they mismatch throw an error?

We already do that and that is not the issue here. The issue is if we want to assign two expressions in the same call and they have different size (the sizes of results match expressions or can be resized so they do).

It sounds like we should have another test that is something like

  auto m1_m2_tmp = m1_cl - m2_cl;
  auto m3_m4_tmp = m1_cl - m2_cl;
  EXPECT_THROW(stan::math::results(res1_cl, res2_cl)
               = stan::math::expressions(m1_m2_tmp, m2_m4_tmp),
               std::invalid_argument);

}

/**
* Incrementing \c results_ object by \c expressions_cl object
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Missing docs for the templates. Those are pretty complicated requires so please write out what's going on there

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Template is doced on line 339. The require is also relatively simple - just the number of results and expressions must match.

Expand Down Expand Up @@ -531,11 +541,16 @@ class results_cl {
* @param expression expression
* @return a tuple of pair of result and expression
*/
template <typename T_result, typename T_expression,
require_all_not_t<is_without_output<T_expression>,
conjunction<internal::is_scalar_check<T_result>,
std::is_arithmetic<std::decay_t<
T_expression>>>>* = nullptr>
template <
typename T_result, typename T_expression,
require_all_not_t<
is_without_output<T_expression>,
conjunction<internal::is_scalar_check<T_result>,
std::is_arithmetic<std::decay_t<T_expression>>>,
conjunction<
is_matrix_cl<T_result>, is_matrix_cl<T_expression>,
std::is_same<value_type_t<T_expression>, value_type_t<T_result>>,
std::is_rvalue_reference<T_expression&&>>>* = nullptr>
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If this is strictly for rvalue references why are we forwarding in the below? And why does this one need to be an rvalue reference?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There are two things that can be r- or l-value references. One is tuple (which in this case must be a rvalue reference and than there are tuple elements that can be either.

static auto make_assignment_pair(T_result&& result,
T_expression&& expression) {
return std::make_tuple(
Expand Down Expand Up @@ -576,6 +591,22 @@ class results_cl {
}
return std::make_tuple();
}

/**
* Optimized move assignment of a `matrix_cl` into another `matrix_cl`.
* @param result result - check
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What does result - check mean?

* @param pass bool scalar
* @return an empty tuple
*/
template <typename T_result, typename T_matrix,
require_all_matrix_cl_t<T_result, T_matrix>* = nullptr,
require_all_st_same<T_result, T_matrix>* = nullptr,
require_t<std::is_rvalue_reference<T_matrix&&>>* = nullptr>
static std::tuple<> make_assignment_pair(T_result&& result,
T_matrix&& matrix) {
result = std::move(matrix);
return std::make_tuple();
}
SteveBronder marked this conversation as resolved.
Show resolved Hide resolved
};

/**
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -131,4 +131,21 @@ TEST(KernelGenerator, multi_result_kernel_reuse_kernel) {
EXPECT_MATRIX_NEAR(res_diff, correct_diff, 1e-9);
}

TEST(KernelGenerator, multi_result_kernel_matrix_cl_move) {
MatrixXd m1(3, 3);
m1 << 1, 2, 3, 4, 5, 6, 7, 8, 9;
MatrixXd m2(3, 3);
m2 << 10, 100, 1000, 0, -10, -12, 2, 4, 8;

matrix_cl<double> m1_cl(m1);
matrix_cl<double> m2_cl(m2);

cl_mem m2_buf = m2_cl.buffer()();

stan::math::results(m1_cl) = stan::math::expressions(m2_cl);
EXPECT_NE(m1_cl.buffer()(), m2_buf);
stan::math::results(m1_cl) = stan::math::expressions(std::move(m2_cl));
EXPECT_EQ(m1_cl.buffer()(), m2_buf);
}

#endif