-
-
Notifications
You must be signed in to change notification settings - Fork 187
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Vectorised log_mix with gradients #664
Conversation
Can one of the admins verify this patch? |
Jenkins, retest this please. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This has been sitting here awhile so I figured I'd leave some comments. I'll leave the Official Review for one of the Columbia folks. Other than the inline comments, I think this function should have rev/fwd/mix tests as well.
This'll be cool to get in. Anything that makes mixture models easier to read is a plus :D.
We'll see if Sean's link on Discourse worked and I manage to communicate this stuff without sounding like a jerk :P.
stan/math/prim/mat/fun/log_mix.hpp
Outdated
const size_t N = theta.size(); | ||
|
||
for (size_t n = 0; n < N; ++n) { | ||
check_not_nan(function, "lambda", lambda[n]); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think all these checks have vectorized versions, so check_not_nan(function, "lambda", lambda)
etc. should just work.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, you want to use vectorized versions so that the error messages can report indexes.
stan/math/prim/mat/fun/log_mix.hpp
Outdated
Eigen::Matrix<T_partials_return, Eigen::Dynamic, 1> theta_dbl(N, 1); | ||
scalar_seq_view<T_theta> theta_vec(theta); | ||
for (size_t n = 0; n < N; ++n) | ||
theta_dbl[n] = value_of(theta_vec[n]); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you get rid of this extra indention on the for loop and theta_dbl? And elsewhere in this pull?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Statements should left-align under previous statements unless nested in a block.
stan/math/prim/mat/fun/log_mix.hpp
Outdated
lam_deriv[n] = theta_deriv[n] * theta_dbl[n]; | ||
|
||
operands_and_partials<T_theta, T_lam> ops_partials(theta, lambda); | ||
if (!(is_constant_struct<T_theta>::value |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does the outer if need to be here? Won't the inner ifs do the job?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@bbbales2 is right that the outer loop doesn't need to be written.
I also think it's wrong as written because you won't match both is_constant_struct<T_lam>
and !is_constant_struct<T_lam>
. So this needs a test that will catch this error.
stan/math/prim/mat/fun/log_mix.hpp
Outdated
theta_dbl); | ||
|
||
Eigen::Matrix<T_partials_return, Eigen::Dynamic, 1> theta_deriv(N, 1); | ||
theta_deriv = exp_lambda_dbl / dot_exp_lam_theta; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can this not be computed like:
exp(lambda_dbl - log_sum_exp(logp_tmp))
To the same effect? In this way the log_sum_exp handles the overflow trick, and it simplifies the code a bit.
stan/math/prim/mat/fun/log_mix.hpp
Outdated
typedef typename stan::partials_return_type<T_theta, T_lam>::type | ||
T_partials_return; | ||
|
||
const size_t N = theta.size(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd use https://github.com/stan-dev/math/blob/develop/stan/math/prim/scal/meta/length.hpp here. I think Eigen types use ints and std::vectors use size_ts (so this sorta thing gives compiler warnings sometimes).
Thanks for that Ben, much appreciated. Just in the middle of extending this to an array version (so that you can pass an array of N density vectors, rather than looping over the function N times), so I'll get your comments in after that. |
The array version for this will take a bit more work, so I'll make a separate pull request when that's ready. The only thing I haven't been able to get working is the vectorised error checking. It seems that when the error checking functions call stan::get to return a value from the vector to be checked, that get is returning the full vector instead. The correct template is being called, so the function recognises that it's a vector input, but only the |
@andrjohns Would you mind creating an issue for the vectorized error checking? I don't quite understand what you're saying is a problem or how you're trying to instantiate it. Or just reply here with a link to the file you're having trouble getting to compile. |
Cool beans. So this function gives the log density for one mixture. Is the array version for doing a bunch of mixtures at once? With regards to the checks, is it that x can be a If that's the case, I think it should be fine to just use loops for the checks. Just expound more on the possible input types in the docs for the function. |
Yep, it's to add a lot of flexibility to how the mixture is specified. The end goal is to allow all combinations of
Should let me do a lot less looping in Stan!
That was just for the regular NaN/bounded/finite checks. But I found where my mistake was, so the checks are working fine without loops now. |
There's a third argument for the mixture simplex (or simplexes if you really go crazy). |
For the mixture proportions? Where the scalar log_mix is Having the function take a vector of densities was the easiest way of allowing an arbitrary number, but I could possibly recreate the structure of the scalar log_mix with a variadic template: |
Speaking of simplexes, when calculating the derivatives should I be assuming that the input simplex was constructed via stick-breaking, and not calculating the partial derivative for the Kth value? I was originally allowing for the fact that some users might be passing the softmax of a vector as the probability argument, but I hadn't considered whether this would affect use with a default (stick-breaking) simplex. |
OK. I think it helps to have arguments. I thought you were
doing a different kind of vectorization. I'd write it as
real log_mix(vector lambda, vector log_prob);
with implementation equivalent to:
vector[size(lambda)] lp;
for (k in 1:size(lambda))
lp[k] = log(lambda[k]) + log_prob[k];
return log_sum_exp(lp);
it's not vector[] and vector[] --- the [] aren't used
for vectors in function args as no signs are given. There's
a description in the manual.
- Bob
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This looks good to me! Someone else will have to do the final review handle the actual accept (I don't have repo permissions). @syclik maybe?
The only thing that I see that's out of place are the higher order autodiff tests are all in terms of Eigen types (not really std::vectors). Not sure it matters. std::vectors are tested in prim
One question, this is for the signature
log_mix(vector theta, vector lambda)
Right?
The fancier signatures you mentioned:
log_mix(vectors theta, vector lambda)
log_mix(vector theta, vectors lambda)
log_mix(vectors theta, vectors lambda)
are gonna wait?
stan/math/prim/mat/fun/log_mix.hpp
Outdated
T_partials_return logp = log_sum_exp(logp_tmp); | ||
|
||
Eigen::Matrix<T_partials_return, Eigen::Dynamic, 1> theta_deriv(N, 1); | ||
theta_deriv.array() = (lam_dbl.array() - logp).exp(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fix indent
stan/math/prim/mat/fun/log_mix.hpp
Outdated
theta_deriv.array() = (lam_dbl.array() - logp).exp(); | ||
|
||
Eigen::Matrix<T_partials_return, Eigen::Dynamic, 1> lam_deriv(N, 1); | ||
lam_deriv.array() = theta_deriv.array() * theta_dbl.array(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fix indent
|
||
TEST(MatrixFunctions, LogMix_Values) { | ||
stan::math::vector_d prob(5, 1); | ||
prob << 0.1, 0.3, 0.25, 0.15, 0.2; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fix indents in this file
using stan::math::vector_v; | ||
|
||
vector_v prob(4), dens(4); | ||
prob << 0.13, 0.22, 0.38, 0.27; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Indents
using stan::math::row_vector_v; | ||
|
||
row_vector_v prob(4), dens(4); | ||
prob << 0.03, 0.21, 0.63, 0.13; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Indents
Thanks Ben, good catch! I've added in the tests for std::vectors as well.
Yep, this is just for the |
Could someone start the testing for this on Jenkins? Thanks! |
Jenkins, OK to test. |
I'll review this one. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hope these changes are minor.
stan/math/prim/mat/fun/log_mix.hpp
Outdated
const size_t N = theta.size(); | ||
|
||
for (size_t n = 0; n < N; ++n) { | ||
check_not_nan(function, "lambda", lambda[n]); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, you want to use vectorized versions so that the error messages can report indexes.
stan/math/prim/mat/fun/log_mix.hpp
Outdated
Eigen::Matrix<T_partials_return, Eigen::Dynamic, 1> theta_dbl(N, 1); | ||
scalar_seq_view<T_theta> theta_vec(theta); | ||
for (size_t n = 0; n < N; ++n) | ||
theta_dbl[n] = value_of(theta_vec[n]); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Statements should left-align under previous statements unless nested in a block.
stan/math/prim/mat/fun/log_mix.hpp
Outdated
*/ | ||
Eigen::Matrix<T_partials_return, Eigen::Dynamic, 1> exp_lambda_dbl(N, 1); | ||
double max_val = max(lambda_dbl); | ||
for (size_t n = 0; n < N; ++n) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Align each statement directly under the last except in nested blocks.
stan/math/prim/mat/fun/log_mix.hpp
Outdated
|
||
Eigen::Matrix<T_partials_return, Eigen::Dynamic, 1> theta_dbl(N, 1); | ||
scalar_seq_view<T_theta> theta_vec(theta); | ||
for (size_t n = 0; n < N; ++n) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You should create theta_dbl
to be the same size as theta
to cut down on memory allocation and assignments. The scalar sequence view will automatically broadcast when you use it.
stan/math/prim/mat/fun/log_mix.hpp
Outdated
for (size_t n = 0; n < N; ++n) | ||
exp_lambda_dbl[n] = exp((lambda_dbl[n] - max_val)); | ||
|
||
T_partials_return dot_exp_lam_theta = dot_product(exp_lambda_dbl, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This doesn't need to be a dot-product unless both are vectors; otherwise, it's more efficient as scalar-vector operation. I'm not sure how to deal with this efficiently.
You should also be using auto
as the type on the left here. That'll avoid over-promotion and is fair game now that we've opened up C++11 constructs.
stan/math/prim/mat/fun/log_mix.hpp
Outdated
lam_dbl[n] = value_of(lam_vec[n]); | ||
|
||
Eigen::Matrix<T_partials_return, Eigen::Dynamic, 1> logp_tmp(N, 1); | ||
logp_tmp = log(theta_dbl) + lam_dbl; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this one can be a declare-define
Eigen::Matrix<T_partials_return, -1, 1> logp_tmp = log(theta_dbl) + lam_dbl;
You might want to typedef the type here as this is the second usage.
stan/math/prim/mat/fun/log_mix.hpp
Outdated
Eigen::Matrix<T_partials_return, Eigen::Dynamic, 1> logp_tmp(N, 1); | ||
logp_tmp = log(theta_dbl) + lam_dbl; | ||
|
||
T_partials_return logp = log_sum_exp(logp_tmp); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't see another usage of logp_tmp
, so I think this can all be replaced with a one-liner:
T_partials_return logp = log_sum_exp(log(theta_dbl) + lam_dbl);
The suffix _tmp
should be avoided. Here, the variable could just be called logp
. All variables are "temp" in some sense and shorter is better all else being equal.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I had to split this into two steps because the return type of log(theta_dbl) + lam_dbl
is a 'CwiseBinaryOp', so log_sum_exp can't deal with it until Eigen assigns the result to a vector.
I'm not sure if there's a way of using the matrix/array wrappers to avoid this, but I haven't had any luck with figuring it out.
stan/math/prim/mat/fun/log_mix.hpp
Outdated
Eigen::Matrix<T_partials_return, Eigen::Dynamic, 1> theta_deriv(N, 1); | ||
theta_deriv.array() = (lam_dbl.array() - logp).exp(); | ||
|
||
Eigen::Matrix<T_partials_return, Eigen::Dynamic, 1> lam_deriv(N, 1); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You don't need to convert to arrays for this. I think this can just be
Eigen::Matrix<T_partials_return, -1, 1> lam_deriv = theta_deriv + theta_dbl;
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I used the array wrappers here because I needed an elementwise multiplication, not addition. But you were right in the end, I didn't need to convert to arrays because Eigen has the .cwiseProduct() operator
stan/math/prim/mat/fun/log_mix.hpp
Outdated
|
||
operands_and_partials<T_theta, T_lam> ops_partials(theta, lambda); | ||
if (!is_constant_struct<T_theta>::value) { | ||
for (int n = 0; n < N; ++n) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can't this just be
ops_partials.edge1_ = theta_deriv;
|
||
if (!is_constant_struct<T_lam>::value) { | ||
for (int n = 0; n < N; ++n) | ||
ops_partials.edge2_.partials_[n] = lam_deriv[n]; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
same comment
@andrjohns Even though I approved this, I had a bunch of suggestions on how to make it better. Would you mind if I made the changes then pushed? |
That sounds great to me (if you don't mind). I'm currently abroad with very sporadic internet access, so I won't be able to submit any changes until mid-Jan
…________________________________
From: Bob Carpenter <notifications@github.com>
Sent: Friday, December 22, 2017 10:09:29 AM
To: stan-dev/math
Cc: Andrew Johnson; Mention
Subject: Re: [stan-dev/math] Vectorised log_mix with gradients (#664)
@andrjohns<https://github.com/andrjohns> Even though I approved this, I had a bunch of suggestions on how to make it better. Would you mind if I made the changes then pushed?
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub<#664 (comment)>, or mute the thread<https://github.com/notifications/unsubscribe-auth/AabxCHDlzSW_wU2lzFgObHou9xBtSMDaks5tCw9ZgaJpZM4QM8lf>.
|
@bob-carpenter, want to make the fixes you suggested so we can get this merged? |
Yes, I can make the fixes. |
Sorry about the delay with this one, I'm catching up on my backlog now. I can make these changes and push them tonight (which would be tomorrow morning Columbia-time) |
That'd be fantastic. I was going to do it, but obviously never got around to it. We'll happily take contributions on whatever timeline they come in.
|
OK, if there's no way to generalize that, we'll have to wait until the underlying operands_and_partials code gets generalized. Let me know if that's everything and I can merge.
|
Yep that's the lot, all ready to merge
…________________________________
From: Bob Carpenter <notifications@github.com>
Sent: Wednesday, January 31, 2018 1:57:34 AM
To: stan-dev/math
Cc: Andrew Johnson; Mention
Subject: Re: [stan-dev/math] Vectorised log_mix with gradients (#664)
OK, if there's no way to generalize that, we'll have to wait until the underlying operands_and_partials code gets generalized. Let me know if that's everything and I can merge.
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub<#664 (comment)>, or mute the thread<https://github.com/notifications/unsubscribe-auth/AabxCPsfdiin_KP7kAD1IaGoTtgxT6nwks5tP1gOgaJpZM4QM8lf>.
|
Sorry for all the false alarms and thanks for clarifying. I'll merge when the tests pass. Thanks. |
Merging! re: operands_and_partials generalization - if this was waiting for multivariate / containers of containers, Sebastian got that in a couple of weeks ago I think. |
It's not necessary, just an improvement if some of the operations could be done with vectors rather than loops.
|
Submission Checklist
./runTests.py test/unit
make cpplint
Summary:
Introduced a
prim/mat
version oflog_mix
with analytic gradients that takes a vector of mixture probabilities and a vector of densities (of arbitrary, but equal, length).So the signature would be:
Intended Effect:
Make specifying mixtures much simpler and more efficient.
This has been on my own wishlist for a while, figured I might as well try and tackle it myself!
How to Verify:
Unit test is included that tests equality with equivalent
log_sum_exp()
parameterisation.Side Effects:
Documentation:
Inline
Copyright and Licensing
Please list the copyright holder for the work you are submitting (this will be you or your assignee, such as a university or company): Andrew Johnson
By submitting this pull request, the copyright holder is agreeing to license the submitted work under the following licenses: