Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Adds lb/ub/lub_constrain specializations #2373

Merged
merged 91 commits into from
Feb 25, 2021
Merged

Conversation

SteveBronder
Copy link
Collaborator

Summary

This PR adds the specializations for lub_constrain for matrices and scalars with all their combinations of inputs. The branch I was working on to make lb/ub/lub constrain work for matvar was getting pretty big so I'm breaking it up into 3 seperate ones.

Tests

Tests were updated for matvar and to the calculations when computing the gradient of both lp and the input.

Side Effects

There's a few weirds I can't completely sort out.

@t4c1 Is make_holder able to deal with plain references? I was trying to use it for the functions that need to calculate lp like

template <typename T, typename L, require_all_eigen_t<T, L>* = nullptr,
          require_all_not_st_var<T, L>* = nullptr>
inline auto lb_constrain(T&& x, L&& lb) {
  return make_holder([](const auto& x, const auto& lb, auto& lp) {
    return x.binaryExpr(lb, [&lp](auto&& x, auto&& lb) {
       return lb_constrain(x, lb, lp);
    });
  }, std::forward<T>(x), std::forward<L>(lb), lp);
}

But I was getting seg faults when trying.

@andrjohns is it possible to use the apply meta stuff to have this work for std::vectors of containers? I couldn't figure out how to have it work whenever I have an x like std::vector<Eigen::MatrixXd> and an lb of double

Release notes

Replace this text with a short note on what will change if this pull request is merged in which case this will be included in the release notes.

Checklist

  • Math issue How to add static matrix? #1805

  • Copyright holder: Steve Bronder

    The copyright holder is typically you or your assignee, such as a university or company. By submitting this pull request, the copyright holder is agreeing to the license the submitted work under the following licenses:
    - Code: BSD 3-clause (https://opensource.org/licenses/BSD-3-Clause)
    - Documentation: CC-BY 4.0 (https://creativecommons.org/licenses/by/4.0/)

  • the basic tests are passing

    • unit tests pass (to run, use: ./runTests.py test/unit)
    • header checks pass, (make test-headers)
    • dependencies checks pass, (make test-math-dependencies)
    • docs build, (make doxygen)
    • code passes the built in C++ standards checks (make cpplint)
  • the code is written in idiomatic C++ and changes are documented in the doxygen

  • the new changes are tested

Comment on lines 254 to 263
auto is_inf_lb = to_arena(
(arena_lb.val().array() != NEGATIVE_INFTY).template cast<double>());
auto precomp_x_exp = to_arena((arena_x.val().array() * is_inf_lb).exp());
arena_t<ret_type> ret = (is_inf_lb).select(
precomp_x_exp + arena_lb.val().array(), arena_x.val().array());
reverse_pass_callback(
[arena_x, arena_lb, ret, is_inf_lb, precomp_x_exp]() mutable {
arena_x.adj().array() += ret.adj().array() * precomp_x_exp;
arena_lb.adj().array() += ret.adj().array() * is_inf_lb;
});
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is sort of a dirty trick, but since exp(0) = 1 We can cast all of these checks to double and then in the forward pass multiply x by the check array. When we make ret we iterate over is_inf_lb and when it's true it uses the values we want and when false just returns arena_x at that value.

But the nice thing is that we can then have no checks in the reverse pass and get nice SIMD instructions since we are just doing an elementwise multiplication

arena_x.adj().array() += ret.adj().array() * precomp_x_exp;

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

PRs introducing such dirty tricks should be accompanied by benchmarks proving they are actually faster than the alternative implementation.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This doesn't seem like a dirty trick. Maybe just a trick lol. I'm happy with it.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually this ended being best case the same speed and worst case a bit slower so I reverted it to just do the checks

@t4c1
Copy link
Contributor

t4c1 commented Feb 17, 2021

@t4c1 Is make_holder able to deal with plain references?

Yes, it should be. Do you even need holders for constrain/free functions? I think probably not.

@bbbales2
Copy link
Member

I was trying to use it for the functions that need to calculate lp like

I don't like the expressions there since lp is an output in this code. Even if it weren't segfaulting, that would mean an accidental double eval of the expression would make a math bug rather than just a performance bug (updating lp twice).

I couldn't figure out how to have it work whenever I have an x like std::vectorEigen::MatrixXd and an lb of double

I thought x was always either a scalar or a matrix here.

@bbbales2
Copy link
Member

@SteveBronder I made a comparison branch where I removed a lot of stuff: feature/lb_constrain-matvar...feature/lb_constrain-matvar_shrunk

I deleted a bunch of stuff and I think this will work okay too. I don't think we need to use identity constrain in here.

I think we only need reverse mode specializations when both types are matrices, otherwise prim can handle the varmat autodiff.

Ofc. this would have more temporaries, but the implementation is way simpler. Can also go verbose for the performance -- it's just a metric ton of code.

@SteveBronder
Copy link
Collaborator Author

idt those will work because we need to check each cell individually for infinity. Doing it the way you wrote it above you end up just checking if all of them are positive infinity and if not then do the constrain.

@bbbales2
Copy link
Member

Doing it the way you wrote it above you end up just checking if all of them are positive infinity and if not then do the constrain.

The signatures are:

lb_constrain(real, real);
lb_constrain(vector, real);
lb_constrain(vector, vector);

(and then the ones for lp)

I'm pretty sure the code I wrote covers that. Is there a signature I'm missing here?

@SteveBronder
Copy link
Collaborator Author

I think that could work but I already wrote out the complicated bits lol so I'd rather just use the specializations since they would be more efficient

@bbbales2
Copy link
Member

Makes sense. Lemme revieeeeew

Copy link
Member

@bbbales2 bbbales2 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Review!

stan/math/prim/fun/lb_constrain.hpp Show resolved Hide resolved
stan/math/prim/fun/identity_constrain.hpp Show resolved Hide resolved
if (unlikely(lb == NEGATIVE_INFTY)) {
return identity_constrain(x, lb);
} else {
// check_less("lb_constrain", "lb", value_of(x), value_of(lb));
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Delete comment

if (lb == NEGATIVE_INFTY) {
return identity_constrain(x, lb);
} else {
// check_less("lb_constrain", "lb", value_of(x), value_of(lb));
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Delete comment

[](const auto& x, const auto& lb) {
return x.unaryExpr([lb](auto&& x) { return lb_constrain(x, lb); });
},
std::forward<T>(x), std::forward<L>(lb));
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[optional] I'd rather just evaluate expressions here because I'm pretty sure we're always assigning to l-values somewhere in generated C++. I think it does cost a malloc or something to make a holder? @t4c1 can confirm if they're free or not.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah good point

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If a holder is constructed from lvalues only it is virtually free. We need a malloc for each rvalue. But let me ask again: Do we even need holders here?


stan::test::expect_ad(f1, x, lb);
// stan::test::expect_ad(f1, x, lb);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Turn these tests back on

test/unit/math/prim/fun/lb_transform_test.cpp Show resolved Hide resolved
test/unit/math/mix/fun/lb_constrain_test.cpp Show resolved Hide resolved
test/unit/math/mix/fun/lb_constrain_test.cpp Show resolved Hide resolved
using stan::math::lb_constrain;
using stan::math::promote_scalar_t;
Eigen::MatrixXd A(2, 2);
// Doesn't work for values near zero fail for fvar hessian?
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this still true?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nope!

@SteveBronder
Copy link
Collaborator Author

@rok-cesnovar hey sorry but can we do the reverse of the last constrain pr up in Stan? We just need to revert the changes to the lb/ub constrain checks

Copy link
Member

@bbbales2 bbbales2 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Couple comments

stan/math/rev/fun/lb_constrain.hpp Outdated Show resolved Hide resolved
test/unit/math/prim/fun/lb_transform_test.cpp Show resolved Hide resolved
test/unit/math/mix/fun/lb_constrain_test.cpp Show resolved Hide resolved
@SteveBronder
Copy link
Collaborator Author

Actually @rok-cesnovar we are going to merge all the changes for the constraints into this PR with sub PRs then once all those are here we can revert the Stan PR

@SteveBronder SteveBronder changed the title Adds lb_constrain specializations Adds lb/ub/lub_constrain specializations Feb 23, 2021
@SteveBronder
Copy link
Collaborator Author

I put up #3012 in the Stan repo. @rok-cesnovar how should we do the double mere thing?

@stan-buildbot
Copy link
Contributor


Name Old Result New Result Ratio Performance change( 1 - new / old )
gp_pois_regr/gp_pois_regr.stan 3.44 3.44 1.0 -0.1% slower
low_dim_corr_gauss/low_dim_corr_gauss.stan 0.02 0.02 1.04 4.03% faster
eight_schools/eight_schools.stan 0.11 0.12 0.95 -5.27% slower
gp_regr/gp_regr.stan 0.15 0.16 0.96 -4.21% slower
irt_2pl/irt_2pl.stan 5.22 5.28 0.99 -1.31% slower
performance.compilation 90.51 88.69 1.02 2.01% faster
low_dim_gauss_mix_collapse/low_dim_gauss_mix_collapse.stan 7.82 8.63 0.91 -10.31% slower
pkpd/one_comp_mm_elim_abs.stan 28.69 30.81 0.93 -7.39% slower
sir/sir.stan 133.6 131.05 1.02 1.91% faster
gp_regr/gen_gp_data.stan 0.04 0.04 0.99 -1.29% slower
low_dim_gauss_mix/low_dim_gauss_mix.stan 3.04 2.97 1.02 2.24% faster
pkpd/sim_one_comp_mm_elim_abs.stan 0.38 0.38 0.98 -2.33% slower
arK/arK.stan 2.51 1.8 1.39 28.16% faster
arma/arma.stan 0.71 0.73 0.97 -3.37% slower
garch/garch.stan 0.66 0.64 1.03 2.61% faster
Mean result: 1.01258767627

Jenkins Console Log
Blue Ocean
Commit hash: 77fde65759e291dd6c84eb5089a322aab3105414


Machine information ProductName: Mac OS X ProductVersion: 10.11.6 BuildVersion: 15G22010

CPU:
Intel(R) Xeon(R) CPU E5-1680 v2 @ 3.00GHz

G++:
Configured with: --prefix=/Applications/Xcode.app/Contents/Developer/usr --with-gxx-include-dir=/usr/include/c++/4.2.1
Apple LLVM version 7.0.2 (clang-700.1.81)
Target: x86_64-apple-darwin15.6.0
Thread model: posix

Clang:
Apple LLVM version 7.0.2 (clang-700.1.81)
Target: x86_64-apple-darwin15.6.0
Thread model: posix

@bbbales2
Copy link
Member

bbbales2 commented Feb 24, 2021

Okay this should be good to go @rok-cesnovar . It is failing in the upstream tests where we expect. The matching Stan pull is: stan-dev/stan#3012

Nevermind, problem, don't merge yet

@stan-buildbot
Copy link
Contributor


Name Old Result New Result Ratio Performance change( 1 - new / old )
gp_pois_regr/gp_pois_regr.stan 3.39 3.42 0.99 -0.97% slower
low_dim_corr_gauss/low_dim_corr_gauss.stan 0.02 0.02 1.05 5.01% faster
eight_schools/eight_schools.stan 0.11 0.11 1.04 3.93% faster
gp_regr/gp_regr.stan 0.15 0.16 0.97 -3.53% slower
irt_2pl/irt_2pl.stan 5.24 5.21 1.01 0.69% faster
performance.compilation 90.72 88.74 1.02 2.18% faster
low_dim_gauss_mix_collapse/low_dim_gauss_mix_collapse.stan 7.79 8.6 0.91 -10.39% slower
pkpd/one_comp_mm_elim_abs.stan 29.42 30.03 0.98 -2.08% slower
sir/sir.stan 128.72 129.05 1.0 -0.26% slower
gp_regr/gen_gp_data.stan 0.04 0.04 0.99 -0.79% slower
low_dim_gauss_mix/low_dim_gauss_mix.stan 3.04 2.97 1.02 2.28% faster
pkpd/sim_one_comp_mm_elim_abs.stan 0.39 0.44 0.89 -12.54% slower
arK/arK.stan 2.52 1.77 1.42 29.69% faster
arma/arma.stan 0.7 0.74 0.95 -5.63% slower
garch/garch.stan 0.66 0.57 1.14 12.62% faster
Mean result: 1.02529879931

Jenkins Console Log
Blue Ocean
Commit hash: 77fde65759e291dd6c84eb5089a322aab3105414


Machine information ProductName: Mac OS X ProductVersion: 10.11.6 BuildVersion: 15G22010

CPU:
Intel(R) Xeon(R) CPU E5-1680 v2 @ 3.00GHz

G++:
Configured with: --prefix=/Applications/Xcode.app/Contents/Developer/usr --with-gxx-include-dir=/usr/include/c++/4.2.1
Apple LLVM version 7.0.2 (clang-700.1.81)
Target: x86_64-apple-darwin15.6.0
Thread model: posix

Clang:
Apple LLVM version 7.0.2 (clang-700.1.81)
Target: x86_64-apple-darwin15.6.0
Thread model: posix

@SteveBronder
Copy link
Collaborator Author

No sorry this is ready for merge I made a false alarm. I thought this was valid Stan code

parameters {
  real xx[N];
  real<lower=xx,upper=1> theta[N, N];
}

but it's not so we r good

10:49
--- Translating Stan model to C++ code ---
bin/stanc  --o=examples/bernoulli/bernoulli.hpp examples/bernoulli/bernoulli.stan
Semantic error in 'examples/bernoulli/bernoulli.stan', line 7, column 13 to column 15:
   -------------------------------------------------
     5:  parameters {
     6:    real xx[N];
     7:    real<lower=xx,upper=1> theta[N, N];
                      ^
     8:  }
     9:  model {
   -------------------------------------------------
Lower bound must be a scalar or of type array[,] real. Instead found type array[] real.make: *** [make/program:47: examples/bernoulli/bernoulli.hpp] Error 1

@stan-buildbot
Copy link
Contributor


Name Old Result New Result Ratio Performance change( 1 - new / old )
gp_pois_regr/gp_pois_regr.stan 3.52 3.42 1.03 2.76% faster
low_dim_corr_gauss/low_dim_corr_gauss.stan 0.02 0.02 0.98 -2.56% slower
eight_schools/eight_schools.stan 0.11 0.11 0.99 -0.86% slower
gp_regr/gp_regr.stan 0.15 0.16 0.97 -3.28% slower
irt_2pl/irt_2pl.stan 5.21 5.27 0.99 -0.98% slower
performance.compilation 90.73 88.85 1.02 2.08% faster
low_dim_gauss_mix_collapse/low_dim_gauss_mix_collapse.stan 7.99 8.6 0.93 -7.54% slower
pkpd/one_comp_mm_elim_abs.stan 28.31 29.53 0.96 -4.29% slower
sir/sir.stan 130.42 127.88 1.02 1.94% faster
gp_regr/gen_gp_data.stan 0.04 0.05 0.94 -6.29% slower
low_dim_gauss_mix/low_dim_gauss_mix.stan 3.02 2.99 1.01 1.27% faster
pkpd/sim_one_comp_mm_elim_abs.stan 0.4 0.41 0.98 -2.42% slower
arK/arK.stan 2.5 1.79 1.39 28.24% faster
arma/arma.stan 0.74 0.75 0.99 -1.38% slower
garch/garch.stan 0.65 0.57 1.15 13.09% faster
Mean result: 1.02291233324

Jenkins Console Log
Blue Ocean
Commit hash: 77fde65759e291dd6c84eb5089a322aab3105414


Machine information ProductName: Mac OS X ProductVersion: 10.11.6 BuildVersion: 15G22010

CPU:
Intel(R) Xeon(R) CPU E5-1680 v2 @ 3.00GHz

G++:
Configured with: --prefix=/Applications/Xcode.app/Contents/Developer/usr --with-gxx-include-dir=/usr/include/c++/4.2.1
Apple LLVM version 7.0.2 (clang-700.1.81)
Target: x86_64-apple-darwin15.6.0
Thread model: posix

Clang:
Apple LLVM version 7.0.2 (clang-700.1.81)
Target: x86_64-apple-darwin15.6.0
Thread model: posix

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

6 participants