Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add Eigen Dense and Sparse vari_value types #1952

Merged
merged 224 commits into from
Aug 7, 2020

Conversation

SteveBronder
Copy link
Collaborator

Summary

Adds a dense and sparse matrix specialization to vari_value. The dense and sparse vari_value matrix types use an Eigen::Map to hold stack allocated memory for the val_ and adj_.

Tests

This needs more testing, but I could use some help figuring out what needs done. Right now I'm testing that

  1. Eigen matrix expressions are allowed to be passed into the constructor
  2. Standard var_value(var_value()) and var_value(vari_value()) construction works
  3. That member vals are the same as the input vals for the above cases.

Side Effects

Yes, we need to be careful when constructing vari_value and var_value types that the template type is a plain eigen type (aka not an Expression just an Matrix, Array, or SparseMatrix, etc.). I think I'll add an is_plain_type type trait to check that templates for vari_value are actually plain types.

Release notes

Allows vari to hold an Eigen type

Checklist

  • Math issue Meta Issue for Static Matrices #1875

  • Copyright holder: Steve Bronder

    The copyright holder is typically you or your assignee, such as a university or company. By submitting this pull request, the copyright holder is agreeing to the license the submitted work under the following licenses:
    - Code: BSD 3-clause (https://opensource.org/licenses/BSD-3-Clause)
    - Documentation: CC-BY 4.0 (https://creativecommons.org/licenses/by/4.0/)

  • the basic tests are passing

    • unit tests pass (to run, use: ./runTests.py test/unit)
    • header checks pass, (make test-headers)
    • dependencies checks pass, (make test-math-dependencies)
    • docs build, (make doxygen)
    • code passes the built in C++ standards checks (make cpplint)
  • the code is written in idiomatic C++ and changes are documented in the doxygen

  • the new changes are tested

serban-nicusor-toptal and others added 30 commits July 18, 2019 23:24
…d::numeric_limits so that it conforms to the C++11 definition
…_vari, and avoiding copies for integral type vari_value
@stan-buildbot
Copy link
Contributor


Name Old Result New Result Ratio Performance change( 1 - new / old )
gp_pois_regr/gp_pois_regr.stan 4.01 4.01 1.0 0.16% faster
low_dim_corr_gauss/low_dim_corr_gauss.stan 0.02 0.02 0.98 -2.37% slower
eight_schools/eight_schools.stan 0.09 0.09 1.02 1.68% faster
gp_regr/gp_regr.stan 0.19 0.19 0.98 -1.97% slower
irt_2pl/irt_2pl.stan 5.33 5.41 0.98 -1.54% slower
performance.compilation 87.91 85.88 1.02 2.31% faster
low_dim_gauss_mix_collapse/low_dim_gauss_mix_collapse.stan 7.57 7.83 0.97 -3.38% slower
pkpd/one_comp_mm_elim_abs.stan 27.55 27.24 1.01 1.14% faster
sir/sir.stan 110.93 133.16 0.83 -20.04% slower
gp_regr/gen_gp_data.stan 0.04 0.04 0.99 -0.83% slower
low_dim_gauss_mix/low_dim_gauss_mix.stan 2.99 3.01 0.99 -0.52% slower
pkpd/sim_one_comp_mm_elim_abs.stan 0.41 0.38 1.07 6.92% faster
arK/arK.stan 1.72 1.8 0.96 -4.2% slower
arma/arma.stan 0.59 0.59 1.01 0.67% faster
garch/garch.stan 0.59 0.53 1.11 10.25% faster
Mean result: 0.995881756989

Jenkins Console Log
Blue Ocean
Commit hash: e290f12


Machine information ProductName: Mac OS X ProductVersion: 10.11.6 BuildVersion: 15G22010

CPU:
Intel(R) Xeon(R) CPU E5-1680 v2 @ 3.00GHz

G++:
Configured with: --prefix=/Applications/Xcode.app/Contents/Developer/usr --with-gxx-include-dir=/usr/include/c++/4.2.1
Apple LLVM version 7.0.2 (clang-700.1.81)
Target: x86_64-apple-darwin15.6.0
Thread model: posix

Clang:
Apple LLVM version 7.0.2 (clang-700.1.81)
Target: x86_64-apple-darwin15.6.0
Thread model: posix

* will be called in the reverse order of construction.
*
* @tparam S A `matrix_cl` or kernel generator expression type that is
* convertible to `value_type`
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

value_type isn't here anymore

* will be called in the reverse order of construction.
*
* @tparam S A `matrix_cl` or kernel generator expression type that is
* convertible to `value_type`
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

value_type isn't here anymore

SteveBronder and others added 3 commits August 5, 2020 18:34
…var_value's grad so that it is only available when the value_type of the var_value is not a container
@SteveBronder
Copy link
Collaborator Author

@bbbales2 as we discussed I added a grad() method that just starts at the bottom of the stack and calls each chain without setting the adjoints to anything first. Also added the stuff so that the grad() method in var_value only exist if the inner type is a scalar

@t4c1 do you want to get the operands_and_partials branch approved/merged in here first before approving/merging this branch to develop? I'm fine either way

@SteveBronder
Copy link
Collaborator Author

@serban-nicusor-toptal getting a Jenkins error I haven't seen before

+ git pull
Already up to date.
+ git checkout develop
error: pathspec 'develop' did not match any file(s) known to git
script returned exit code 1

Do you know what's up with that?

@serban-nicusor-toptal
Copy link
Contributor

Hey @SteveBronder everything is back to normal, sorry for the trouble!
Jenkins was only fetching this PR from the origin so I had to make sure it checks out all of them.

* <p>This function does not recover any memory from the computation.
*
*/
static void grad() {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The duplicate code here can be eliminated pretty easily.

Either call this grad inside the other grad or do a default nullptr argument in the other thing:

template <typename Vari>
static void grad(Vari* vi) {
  vi->init_dependent();
  grad();
}

or

template <typename Vari>
static void grad(Vari* vi = nullptr) {
  if(!vi)
    vi->init_dependent();
  size_t end = ChainableStack::instance_->var_stack_.size();
  size_t beginning = empty_nested() ? 0 : end - nested_size();
  for (size_t i = end; i-- > beginning;) {
    ChainableStack::instance_->var_stack_[i]->chain();
  }
}

and delete the second definition.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done deal!

bbbales2
bbbales2 previously approved these changes Aug 6, 2020
@SteveBronder
Copy link
Collaborator Author

@serban-nicusor-toptal much appreciated!

@stan-buildbot
Copy link
Contributor


Name Old Result New Result Ratio Performance change( 1 - new / old )
gp_pois_regr/gp_pois_regr.stan 4.18 4.0 1.05 4.31% faster
low_dim_corr_gauss/low_dim_corr_gauss.stan 0.02 0.02 1.05 4.33% faster
eight_schools/eight_schools.stan 0.09 0.09 0.98 -2.49% slower
gp_regr/gp_regr.stan 0.2 0.21 0.96 -3.88% slower
irt_2pl/irt_2pl.stan 5.41 5.46 0.99 -0.9% slower
performance.compilation 86.97 85.53 1.02 1.66% faster
low_dim_gauss_mix_collapse/low_dim_gauss_mix_collapse.stan 7.82 7.85 1.0 -0.33% slower
pkpd/one_comp_mm_elim_abs.stan 27.06 28.97 0.93 -7.08% slower
sir/sir.stan 132.12 130.02 1.02 1.6% faster
gp_regr/gen_gp_data.stan 0.05 0.05 0.96 -3.83% slower
low_dim_gauss_mix/low_dim_gauss_mix.stan 3.01 3.0 1.0 0.46% faster
pkpd/sim_one_comp_mm_elim_abs.stan 0.39 0.39 1.0 0.06% faster
arK/arK.stan 1.78 1.8 0.99 -1.37% slower
arma/arma.stan 0.59 0.73 0.81 -23.67% slower
garch/garch.stan 0.53 0.58 0.93 -8.02% slower
Mean result: 0.978176087394

Jenkins Console Log
Blue Ocean
Commit hash: c6af0b0


Machine information ProductName: Mac OS X ProductVersion: 10.11.6 BuildVersion: 15G22010

CPU:
Intel(R) Xeon(R) CPU E5-1680 v2 @ 3.00GHz

G++:
Configured with: --prefix=/Applications/Xcode.app/Contents/Developer/usr --with-gxx-include-dir=/usr/include/c++/4.2.1
Apple LLVM version 7.0.2 (clang-700.1.81)
Target: x86_64-apple-darwin15.6.0
Thread model: posix

Clang:
Apple LLVM version 7.0.2 (clang-700.1.81)
Target: x86_64-apple-darwin15.6.0
Thread model: posix

@t4c1
Copy link
Contributor

t4c1 commented Aug 6, 2020

@t4c1 do you want to get the operands_and_partials branch approved/merged in here first before approving/merging this branch to develop? I'm fine either way

I think you don't need to wait for me. If this gets ready before ops_partials go ahead and merge it.

Copy link
Contributor

@t4c1 t4c1 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A few more details and this is done.

@@ -76,14 +86,13 @@ class var_value<T, require_floating_point_t<T>> {
* @param x Value of the variable.
*/
template <typename S, require_convertible_t<S&, value_type>* = nullptr>
var_value(const S& x) : vi_(new vari_type(x, false)) {} // NOLINT
var_value(S&& x) : vi_(new vari_type(x, false)) {} // NOLINT
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
var_value(S&& x) : vi_(new vari_type(x, false)) {} // NOLINT
var_value(const S& x) : vi_(new vari_type(x, false)) {} // NOLINT

Comment on lines 210 to 216
static constexpr Eigen::Index RowsAtCompileTime
= PlainObject::RowsAtCompileTime;
/**
* Number of columns known at compile time
*/
static constexpr Eigen::Index ColsAtCompileTime = Scalar::ColsAtCompileTime;
static constexpr Eigen::Index ColsAtCompileTime
= PlainObject::ColsAtCompileTime;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Eigen uses enums to store these. I like our approach with constexprs better, but I think the type should be int.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I thought that enums were preferred since the static constexpr doesn't always operate as purely compile-time (based on this writeup):

Static constant members are lvalues (see Appendix B). So, if we have a declaration such as
void foo(int const&);
and we pass it the result of a metaprogram:
foo(Pow3<7>::value);
a compiler must pass the address of Pow3<7>::value, and that forces the compiler to instantiate and allocate the definition for the static member. As a result, the computation is no longer limited to a pure “compile-time” effect.
Enumeration values aren’t lvalues (i.e., they don’t have an address). So, when we pass them by reference, no static memory is used. It’s almost exactly as if you passed the computed value as a literal. The first edition of this book therefore preferred the use of enumerator constants for this kind of applications.
C++11, however, introduced constexpr static data members, and those are not limited to integral types. They do not solve the address issue raised above, but in spite of that shortcoming they are now a common way to produce results of metaprograms.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I did not know that and I have to say I don't fully understand it. How can enums be passed around without taking memory?

Anyway I am fine with changing this to an enum if that is prefered.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I definitely don't know enough to have preference here, just something I stumbled on recently that I thought was interesting.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reading the section I think the only concern is when the static constexpr is used by a function taking a reference? imo I like the static constexpr version and idt the performance difference here would be drastic with either/or

adj_(x),
chainable_alloc() {
template <typename S, require_convertible_t<S&, T>* = nullptr>
explicit vari_value(S&& x) : val_(x), adj_(x), chainable_alloc() {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
explicit vari_value(S&& x) : val_(x), adj_(x), chainable_alloc() {
explicit vari_value(const S& x) : val_(x), adj_(x), chainable_alloc() {

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually should these be forwarding because sparse holds an actual sparse matrix while the dense matrix one should use const S& since it always allocated memory?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I changed them all to be const S&

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I thought sparse vari used maps too. Than you are right and forwarding can be taken advantage of. In that case you also need to call forward on the assignment.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done deal!

adj_(x),
chainable_alloc() {
template <typename S, require_convertible_t<S&, T>* = nullptr>
vari_value(S&& x, bool stacked) : val_(x), adj_(x), chainable_alloc() {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
vari_value(S&& x, bool stacked) : val_(x), adj_(x), chainable_alloc() {
vari_value(const S& x, bool stacked) : val_(x), adj_(x), chainable_alloc() {

@stan-buildbot
Copy link
Contributor


Name Old Result New Result Ratio Performance change( 1 - new / old )
gp_pois_regr/gp_pois_regr.stan 4.05 4.13 0.98 -1.97% slower
low_dim_corr_gauss/low_dim_corr_gauss.stan 0.02 0.02 0.96 -4.04% slower
eight_schools/eight_schools.stan 0.09 0.09 0.95 -5.77% slower
gp_regr/gp_regr.stan 0.19 0.19 1.0 0.31% faster
irt_2pl/irt_2pl.stan 5.39 5.41 1.0 -0.39% slower
performance.compilation 87.26 85.93 1.02 1.52% faster
low_dim_gauss_mix_collapse/low_dim_gauss_mix_collapse.stan 7.82 7.66 1.02 2.02% faster
pkpd/one_comp_mm_elim_abs.stan 27.8 27.16 1.02 2.3% faster
sir/sir.stan 130.29 131.59 0.99 -1.0% slower
gp_regr/gen_gp_data.stan 0.04 0.05 0.99 -1.19% slower
low_dim_gauss_mix/low_dim_gauss_mix.stan 3.02 2.99 1.01 1.04% faster
pkpd/sim_one_comp_mm_elim_abs.stan 0.41 0.39 1.06 5.9% faster
arK/arK.stan 1.78 1.8 0.99 -1.37% slower
arma/arma.stan 0.59 0.73 0.81 -23.5% slower
garch/garch.stan 0.53 0.57 0.93 -7.55% slower
Mean result: 0.981587372336

Jenkins Console Log
Blue Ocean
Commit hash: 32045cc


Machine information ProductName: Mac OS X ProductVersion: 10.11.6 BuildVersion: 15G22010

CPU:
Intel(R) Xeon(R) CPU E5-1680 v2 @ 3.00GHz

G++:
Configured with: --prefix=/Applications/Xcode.app/Contents/Developer/usr --with-gxx-include-dir=/usr/include/c++/4.2.1
Apple LLVM version 7.0.2 (clang-700.1.81)
Target: x86_64-apple-darwin15.6.0
Thread model: posix

Clang:
Apple LLVM version 7.0.2 (clang-700.1.81)
Target: x86_64-apple-darwin15.6.0
Thread model: posix

@SteveBronder
Copy link
Collaborator Author

Cool cool so I think this is ready to go!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet