Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Vectorised log_mix with gradients #664

Merged
merged 23 commits into from
Jan 31, 2018
Merged

Vectorised log_mix with gradients #664

merged 23 commits into from
Jan 31, 2018

Conversation

andrjohns
Copy link
Collaborator

Submission Checklist

  • Run unit tests: ./runTests.py test/unit
  • Run cpplint: make cpplint
  • Declare copyright holder and open-source license: see below

Summary:

Introduced a prim/mat version of log_mix with analytic gradients that takes a vector of mixture probabilities and a vector of densities (of arbitrary, but equal, length).

So the signature would be:

real = log_mix(vector, vector);

Intended Effect:

Make specifying mixtures much simpler and more efficient.

This has been on my own wishlist for a while, figured I might as well try and tackle it myself!

How to Verify:

Unit test is included that tests equality with equivalent log_sum_exp() parameterisation.

Side Effects:

Documentation:

Inline

Copyright and Licensing

Please list the copyright holder for the work you are submitting (this will be you or your assignee, such as a university or company): Andrew Johnson

By submitting this pull request, the copyright holder is agreeing to license the submitted work under the following licenses:

@stan-buildbot
Copy link
Contributor

Can one of the admins verify this patch?

@seantalts
Copy link
Member

Jenkins, retest this please.

Copy link
Member

@bbbales2 bbbales2 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This has been sitting here awhile so I figured I'd leave some comments. I'll leave the Official Review for one of the Columbia folks. Other than the inline comments, I think this function should have rev/fwd/mix tests as well.

This'll be cool to get in. Anything that makes mixture models easier to read is a plus :D.

We'll see if Sean's link on Discourse worked and I manage to communicate this stuff without sounding like a jerk :P.

const size_t N = theta.size();

for (size_t n = 0; n < N; ++n) {
check_not_nan(function, "lambda", lambda[n]);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think all these checks have vectorized versions, so check_not_nan(function, "lambda", lambda) etc. should just work.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, you want to use vectorized versions so that the error messages can report indexes.

Eigen::Matrix<T_partials_return, Eigen::Dynamic, 1> theta_dbl(N, 1);
scalar_seq_view<T_theta> theta_vec(theta);
for (size_t n = 0; n < N; ++n)
theta_dbl[n] = value_of(theta_vec[n]);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you get rid of this extra indention on the for loop and theta_dbl? And elsewhere in this pull?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Statements should left-align under previous statements unless nested in a block.

lam_deriv[n] = theta_deriv[n] * theta_dbl[n];

operands_and_partials<T_theta, T_lam> ops_partials(theta, lambda);
if (!(is_constant_struct<T_theta>::value
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does the outer if need to be here? Won't the inner ifs do the job?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@bbbales2 is right that the outer loop doesn't need to be written.

I also think it's wrong as written because you won't match both is_constant_struct<T_lam> and !is_constant_struct<T_lam>. So this needs a test that will catch this error.

theta_dbl);

Eigen::Matrix<T_partials_return, Eigen::Dynamic, 1> theta_deriv(N, 1);
theta_deriv = exp_lambda_dbl / dot_exp_lam_theta;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can this not be computed like:

exp(lambda_dbl - log_sum_exp(logp_tmp))

To the same effect? In this way the log_sum_exp handles the overflow trick, and it simplifies the code a bit.

typedef typename stan::partials_return_type<T_theta, T_lam>::type
T_partials_return;

const size_t N = theta.size();
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd use https://github.com/stan-dev/math/blob/develop/stan/math/prim/scal/meta/length.hpp here. I think Eigen types use ints and std::vectors use size_ts (so this sorta thing gives compiler warnings sometimes).

@andrjohns
Copy link
Collaborator Author

Thanks for that Ben, much appreciated. Just in the middle of extending this to an array version (so that you can pass an array of N density vectors, rather than looping over the function N times), so I'll get your comments in after that.

@bob-carpenter
Copy link
Contributor

@bbbales2: It's not just Columbia reviewers. @syclik is in charge of this repo and he's at Generable. I hope you'll feel comfortable reviewing pull requests soon.

@andrjohns
Copy link
Collaborator Author

The array version for this will take a bit more work, so I'll make a separate pull request when that's ready.

The only thing I haven't been able to get working is the vectorised error checking. It seems that when the error checking functions call stan::get to return a value from the vector to be checked, that get is returning the full vector instead. The correct template is being called, so the function recognises that it's a vector input, but only the prim/scal version of stan::get or stan::length is being called (from what I can see).

@bob-carpenter
Copy link
Contributor

@andrjohns Would you mind creating an issue for the vectorized error checking? I don't quite understand what you're saying is a problem or how you're trying to instantiate it. Or just reply here with a link to the file you're having trouble getting to compile.

@bbbales2
Copy link
Member

The array version for this will take a bit more work

Cool beans. So this function gives the log density for one mixture. Is the array version for doing a bunch of mixtures at once?

With regards to the checks, is it that x can be a std::vector<std::vector<var>> or something (which isn't handled by the checks now)?

If that's the case, I think it should be fine to just use loops for the checks. Just expound more on the possible input types in the docs for the function.

@andrjohns
Copy link
Collaborator Author

Cool beans. So this function gives the log density for one mixture. Is the array version for doing a bunch of mixtures at once?

Yep, it's to add a lot of flexibility to how the mixture is specified. The end goal is to allow all combinations of vector and vector[].

  • log_mix(vector, vector[]) (e.g. N density vectors, all with the same mixture proportions)
  • log_mix(vector[], vector) (e.g. single density vector applied to N different mixture proportions)
  • log_mix(vector[], vector[]) (e.g. N density vectors, each with different mixture proportions)

Should let me do a lot less looping in Stan!

With regards to the checks, is it that x can be a std::vector<std::vector> or something (which isn't handled by the checks now)?

That was just for the regular NaN/bounded/finite checks. But I found where my mistake was, so the checks are working fine without loops now.

@bob-carpenter
Copy link
Contributor

There's a third argument for the mixture simplex (or simplexes if you really go crazy).

@andrjohns
Copy link
Collaborator Author

For the mixture proportions? Where the scalar log_mix is log_mix(prob(dens_1), dens_1, dens_2), this version has the mixture simplex as the first argument, and then a vector of the densities as the second argument: log_mix(simplex, densities).

Having the function take a vector of densities was the easiest way of allowing an arbitrary number, but I could possibly recreate the structure of the scalar log_mix with a variadic template: log_mix(simplex, dens_1, dens_2, ..., dens_N) if that's preferred.

@andrjohns
Copy link
Collaborator Author

Speaking of simplexes, when calculating the derivatives should I be assuming that the input simplex was constructed via stick-breaking, and not calculating the partial derivative for the Kth value? I was originally allowing for the fact that some users might be passing the softmax of a vector as the probability argument, but I hadn't considered whether this would affect use with a default (stick-breaking) simplex.

@bob-carpenter
Copy link
Contributor

bob-carpenter commented Nov 17, 2017 via email

Copy link
Member

@bbbales2 bbbales2 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks good to me! Someone else will have to do the final review handle the actual accept (I don't have repo permissions). @syclik maybe?

The only thing that I see that's out of place are the higher order autodiff tests are all in terms of Eigen types (not really std::vectors). Not sure it matters. std::vectors are tested in prim

One question, this is for the signature

log_mix(vector theta, vector lambda)

Right?

The fancier signatures you mentioned:

log_mix(vectors theta, vector lambda)
log_mix(vector theta, vectors lambda)
log_mix(vectors theta, vectors lambda)

are gonna wait?

T_partials_return logp = log_sum_exp(logp_tmp);

Eigen::Matrix<T_partials_return, Eigen::Dynamic, 1> theta_deriv(N, 1);
theta_deriv.array() = (lam_dbl.array() - logp).exp();
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fix indent

theta_deriv.array() = (lam_dbl.array() - logp).exp();

Eigen::Matrix<T_partials_return, Eigen::Dynamic, 1> lam_deriv(N, 1);
lam_deriv.array() = theta_deriv.array() * theta_dbl.array();
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fix indent


TEST(MatrixFunctions, LogMix_Values) {
stan::math::vector_d prob(5, 1);
prob << 0.1, 0.3, 0.25, 0.15, 0.2;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fix indents in this file

using stan::math::vector_v;

vector_v prob(4), dens(4);
prob << 0.13, 0.22, 0.38, 0.27;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Indents

using stan::math::row_vector_v;

row_vector_v prob(4), dens(4);
prob << 0.03, 0.21, 0.63, 0.13;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Indents

@andrjohns
Copy link
Collaborator Author

Thanks Ben, good catch! I've added in the tests for std::vectors as well.

The fancier signatures you mentioned... are gonna wait?

Yep, this is just for the log_mix(vector theta, vector lambda) signature. I'm still working out the kinks with the gradients for the different combinations so they'll take a while longer.

@andrjohns
Copy link
Collaborator Author

Could someone start the testing for this on Jenkins? Thanks!

@bob-carpenter
Copy link
Contributor

Jenkins, OK to test.

@bob-carpenter
Copy link
Contributor

I'll review this one.

Copy link
Contributor

@bob-carpenter bob-carpenter left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hope these changes are minor.

const size_t N = theta.size();

for (size_t n = 0; n < N; ++n) {
check_not_nan(function, "lambda", lambda[n]);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, you want to use vectorized versions so that the error messages can report indexes.

Eigen::Matrix<T_partials_return, Eigen::Dynamic, 1> theta_dbl(N, 1);
scalar_seq_view<T_theta> theta_vec(theta);
for (size_t n = 0; n < N; ++n)
theta_dbl[n] = value_of(theta_vec[n]);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Statements should left-align under previous statements unless nested in a block.

*/
Eigen::Matrix<T_partials_return, Eigen::Dynamic, 1> exp_lambda_dbl(N, 1);
double max_val = max(lambda_dbl);
for (size_t n = 0; n < N; ++n)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Align each statement directly under the last except in nested blocks.


Eigen::Matrix<T_partials_return, Eigen::Dynamic, 1> theta_dbl(N, 1);
scalar_seq_view<T_theta> theta_vec(theta);
for (size_t n = 0; n < N; ++n)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You should create theta_dbl to be the same size as theta to cut down on memory allocation and assignments. The scalar sequence view will automatically broadcast when you use it.

for (size_t n = 0; n < N; ++n)
exp_lambda_dbl[n] = exp((lambda_dbl[n] - max_val));

T_partials_return dot_exp_lam_theta = dot_product(exp_lambda_dbl,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This doesn't need to be a dot-product unless both are vectors; otherwise, it's more efficient as scalar-vector operation. I'm not sure how to deal with this efficiently.

You should also be using auto as the type on the left here. That'll avoid over-promotion and is fair game now that we've opened up C++11 constructs.

lam_dbl[n] = value_of(lam_vec[n]);

Eigen::Matrix<T_partials_return, Eigen::Dynamic, 1> logp_tmp(N, 1);
logp_tmp = log(theta_dbl) + lam_dbl;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this one can be a declare-define

Eigen::Matrix<T_partials_return, -1, 1> logp_tmp = log(theta_dbl) + lam_dbl;

You might want to typedef the type here as this is the second usage.

Eigen::Matrix<T_partials_return, Eigen::Dynamic, 1> logp_tmp(N, 1);
logp_tmp = log(theta_dbl) + lam_dbl;

T_partials_return logp = log_sum_exp(logp_tmp);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't see another usage of logp_tmp, so I think this can all be replaced with a one-liner:

T_partials_return logp = log_sum_exp(log(theta_dbl) + lam_dbl);

The suffix _tmp should be avoided. Here, the variable could just be called logp. All variables are "temp" in some sense and shorter is better all else being equal.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I had to split this into two steps because the return type of log(theta_dbl) + lam_dbl is a 'CwiseBinaryOp', so log_sum_exp can't deal with it until Eigen assigns the result to a vector.

I'm not sure if there's a way of using the matrix/array wrappers to avoid this, but I haven't had any luck with figuring it out.

Eigen::Matrix<T_partials_return, Eigen::Dynamic, 1> theta_deriv(N, 1);
theta_deriv.array() = (lam_dbl.array() - logp).exp();

Eigen::Matrix<T_partials_return, Eigen::Dynamic, 1> lam_deriv(N, 1);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You don't need to convert to arrays for this. I think this can just be

Eigen::Matrix<T_partials_return, -1, 1> lam_deriv = theta_deriv + theta_dbl;

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I used the array wrappers here because I needed an elementwise multiplication, not addition. But you were right in the end, I didn't need to convert to arrays because Eigen has the .cwiseProduct() operator


operands_and_partials<T_theta, T_lam> ops_partials(theta, lambda);
if (!is_constant_struct<T_theta>::value) {
for (int n = 0; n < N; ++n)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can't this just be

ops_partials.edge1_ = theta_deriv;


if (!is_constant_struct<T_lam>::value) {
for (int n = 0; n < N; ++n)
ops_partials.edge2_.partials_[n] = lam_deriv[n];
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

same comment

@bob-carpenter
Copy link
Contributor

@andrjohns Even though I approved this, I had a bunch of suggestions on how to make it better. Would you mind if I made the changes then pushed?

@andrjohns
Copy link
Collaborator Author

andrjohns commented Dec 30, 2017 via email

@syclik
Copy link
Member

syclik commented Jan 20, 2018

@bob-carpenter, want to make the fixes you suggested so we can get this merged?

@bob-carpenter
Copy link
Contributor

bob-carpenter commented Jan 20, 2018

Yes, I can make the fixes.

@andrjohns
Copy link
Collaborator Author

Sorry about the delay with this one, I'm catching up on my backlog now. I can make these changes and push them tonight (which would be tomorrow morning Columbia-time)

@bob-carpenter
Copy link
Contributor

bob-carpenter commented Jan 30, 2018 via email

@bob-carpenter
Copy link
Contributor

bob-carpenter commented Jan 30, 2018 via email

@andrjohns
Copy link
Collaborator Author

andrjohns commented Jan 30, 2018 via email

@bob-carpenter
Copy link
Contributor

Sorry for all the false alarms and thanks for clarifying. I'll merge when the tests pass. Thanks.

@seantalts
Copy link
Member

Merging!

re: operands_and_partials generalization - if this was waiting for multivariate / containers of containers, Sebastian got that in a couple of weeks ago I think.

@seantalts seantalts merged commit 4f382fb into stan-dev:develop Jan 31, 2018
@bob-carpenter
Copy link
Contributor

bob-carpenter commented Jan 31, 2018 via email

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants