Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Allow the var views to compound multiple expressions and remove var.coeff() callbacks #2064

Merged
merged 81 commits into from
Oct 7, 2020

Conversation

SteveBronder
Copy link
Collaborator

@SteveBronder SteveBronder commented Sep 7, 2020

Summary

This PR allows for var<matrix> types to chain together multiple slicing methods such as

auto A_flip = A_v.rowwise().reverse().colwise().reverse();

It also has a specialization of vari_value<T> when T is a reference to a scalar type. This specialization removes the callbacks that were needed in the last implementation of the var_value.coeff() methods.

While I was working on the var<Matrix> indexing methods in this stan branch I realized I needed to be able to write compound expressions such as a_var_matrix.rowwise().reverse(). reverse() was needed in particular because stan's indexing method for code such as

matrix[K, K] A = Other[5:K, 5:K]

uses an index_min_max(5, K) when performing indexing, but this name is dubious! The stan language will use index_min_max(5, K) even if we don't know whether K is greater than the starting index! In the case of a decreasing index we need to be able to call eigen's .reverse() function so that the min_max index can grab the appropriate slice and then reverse the order.

This PR adds those methods and allows them to be chained together. To do that I had to also change is_eigen and plain_type_t. Currently is_eigen checks to see whether or not a pointer to a type T is convertible to Eigen::EigenBase<T>*. But some Eigen expressions such as VectorWiseOp don't actually inherit from EigenBase so is_eigen<VectorWiseOp<T>> fails (see line 156 of the VectorWiseOp code link). VectorWiseOp also does not have a PlainObject trait so plain_type_t fails for these types. This means I had to change both of these definitions a bit where is_eigen now checks whether the type T is pointer convertible to EigenBase<T>* or if T has a trait named ExpressionTypeNestedCleaned. A similar thing is used for plain_type_t where we detect whether the class has either a PlainObject trait or a ExpressionTypeNestedCleaned trait and report back either. (@t4c1 this feels a bit ad-hoc so if you have a nicer way to deduce these I'm all ears!)

Another notable change in this PR is that there is now a vari_value<T> specialization for when T is a reference type. Having this specialization allows us to remove the callbacks that were needed when slicing a vari<matrix> type to get back a single coefficient.

Tests

This adds additional tests in var_test to check if compound slicing expressions work and if their values and adjoints match up to the original values.

Side Effects

Yes, I worry that the vari_value<T> specialization that accepts references is not defensive enough. Right now I have the specialization marked as final which should stop anyone from deriving a class from it. This class should really only be used in var_value, is there some way to allow that?

Release notes

Adds views to vari_value<Matrix> types that allow for compound slicing expressions.

Checklist

  • Math issue How to add static matrix? #1805

  • Copyright holder: Steve Bronder

    The copyright holder is typically you or your assignee, such as a university or company. By submitting this pull request, the copyright holder is agreeing to the license the submitted work under the following licenses:
    - Code: BSD 3-clause (https://opensource.org/licenses/BSD-3-Clause)
    - Documentation: CC-BY 4.0 (https://creativecommons.org/licenses/by/4.0/)

  • the basic tests are passing

    • unit tests pass (to run, use: ./runTests.py test/unit)
    • header checks pass, (make test-headers)
    • dependencies checks pass, (make test-math-dependencies)
    • docs build, (make doxygen)
    • code passes the built in C++ standards checks (make cpplint)
  • the code is written in idiomatic C++ and changes are documented in the doxygen

  • the new changes are tested

@stan-buildbot
Copy link
Contributor


Name Old Result New Result Ratio Performance change( 1 - new / old )
gp_pois_regr/gp_pois_regr.stan 4.33 4.2 1.03 2.98% faster
low_dim_corr_gauss/low_dim_corr_gauss.stan 0.02 0.02 1.0 0.29% faster
eight_schools/eight_schools.stan 0.09 0.09 0.97 -2.56% slower
gp_regr/gp_regr.stan 0.2 0.19 1.01 1.06% faster
irt_2pl/irt_2pl.stan 5.21 5.24 0.99 -0.75% slower
performance.compilation 88.96 87.39 1.02 1.76% faster
low_dim_gauss_mix_collapse/low_dim_gauss_mix_collapse.stan 8.19 8.31 0.99 -1.37% slower
pkpd/one_comp_mm_elim_abs.stan 26.53 28.94 0.92 -9.07% slower
sir/sir.stan 128.71 134.19 0.96 -4.26% slower
gp_regr/gen_gp_data.stan 0.05 0.05 1.02 1.52% faster
low_dim_gauss_mix/low_dim_gauss_mix.stan 3.33 3.4 0.98 -2.07% slower
pkpd/sim_one_comp_mm_elim_abs.stan 0.38 0.38 0.99 -1.45% slower
arK/arK.stan 2.54 2.53 1.0 0.14% faster
arma/arma.stan 0.74 0.73 1.01 1.11% faster
garch/garch.stan 0.73 0.73 1.0 -0.38% slower
Mean result: 0.992124942235

Jenkins Console Log
Blue Ocean
Commit hash: a8f88aa


Machine information ProductName: Mac OS X ProductVersion: 10.11.6 BuildVersion: 15G22010

CPU:
Intel(R) Xeon(R) CPU E5-1680 v2 @ 3.00GHz

G++:
Configured with: --prefix=/Applications/Xcode.app/Contents/Developer/usr --with-gxx-include-dir=/usr/include/c++/4.2.1
Apple LLVM version 7.0.2 (clang-700.1.81)
Target: x86_64-apple-darwin15.6.0
Thread model: posix

Clang:
Apple LLVM version 7.0.2 (clang-700.1.81)
Target: x86_64-apple-darwin15.6.0
Thread model: posix

@SteveBronder SteveBronder changed the title Allow the var views to compound multiple expressions and remove var.coeff() callbacks [WIP] Allow the var views to compound multiple expressions and remove var.coeff() callbacks Sep 8, 2020
@SteveBronder
Copy link
Collaborator Author

Marking this as WIP because I also need to sort out how to assign a var_value that has an expression to a var_value with a plain type

@t4c1
Copy link
Contributor

t4c1 commented Sep 8, 2020

is_eigen<VectorWiseOp> fails

In my opinion that is fine. We don't want to (and can't) support VectorWiseOp everywhere we support all other eigen expressions. So if we do need to support it somewhere we should add some special handling (even if that is just an extended require).

Another notable change in this PR is that there is now a vari_value specialization for when T is a reference type.

Nice. However I believe a number of open questions will appear regarding this implementation.

@SteveBronder
Copy link
Collaborator Author

In my opinion that is fine. We don't want to (and can't) support VectorWiseOp everywhere we support all other eigen expressions. So if we do need to support it somewhere we should add some special handling (even if that is just an extended require).

I agree with the is_eigen stuff and can revert that. Though I'd like to leave the plain_type_t because I think that does make sense

Another notable change in this PR is that there is now a vari_value specialization for when T is a reference type.

Nice. However I believe a number of open questions will appear regarding this implementation.

Yeah. I think one solution could be to make all of it's constructors private. Then since var is a private friend it should be the only one able to access them (I think?) if not I can take another crack at the var_view type

@stan-buildbot
Copy link
Contributor


Name Old Result New Result Ratio Performance change( 1 - new / old )
gp_pois_regr/gp_pois_regr.stan 4.18 4.3 0.97 -2.87% slower
low_dim_corr_gauss/low_dim_corr_gauss.stan 0.02 0.02 0.97 -3.05% slower
eight_schools/eight_schools.stan 0.09 0.09 0.98 -1.76% slower
gp_regr/gp_regr.stan 0.2 0.2 1.01 1.18% faster
irt_2pl/irt_2pl.stan 5.26 5.2 1.01 1.31% faster
performance.compilation 88.73 87.7 1.01 1.17% faster
low_dim_gauss_mix_collapse/low_dim_gauss_mix_collapse.stan 8.21 8.26 0.99 -0.63% slower
pkpd/one_comp_mm_elim_abs.stan 28.02 27.4 1.02 2.23% faster
sir/sir.stan 141.09 137.84 1.02 2.31% faster
gp_regr/gen_gp_data.stan 0.05 0.05 0.99 -1.45% slower
low_dim_gauss_mix/low_dim_gauss_mix.stan 3.36 3.45 0.97 -2.73% slower
pkpd/sim_one_comp_mm_elim_abs.stan 0.39 0.42 0.93 -7.51% slower
arK/arK.stan 2.56 2.58 0.99 -0.86% slower
arma/arma.stan 0.73 0.72 1.01 1.19% faster
garch/garch.stan 0.73 0.73 1.0 0.11% faster
Mean result: 0.993104219883

Jenkins Console Log
Blue Ocean
Commit hash: 93d58a5


Machine information ProductName: Mac OS X ProductVersion: 10.11.6 BuildVersion: 15G22010

CPU:
Intel(R) Xeon(R) CPU E5-1680 v2 @ 3.00GHz

G++:
Configured with: --prefix=/Applications/Xcode.app/Contents/Developer/usr --with-gxx-include-dir=/usr/include/c++/4.2.1
Apple LLVM version 7.0.2 (clang-700.1.81)
Target: x86_64-apple-darwin15.6.0
Thread model: posix

Clang:
Apple LLVM version 7.0.2 (clang-700.1.81)
Target: x86_64-apple-darwin15.6.0
Thread model: posix

@SteveBronder SteveBronder changed the title [WIP] Allow the var views to compound multiple expressions and remove var.coeff() callbacks Allow the var views to compound multiple expressions and remove var.coeff() callbacks Sep 9, 2020
@SteveBronder SteveBronder mentioned this pull request Sep 9, 2020
5 tasks
@t4c1
Copy link
Contributor

t4c1 commented Sep 9, 2020

Though I'd like to leave the plain_type_t because I think that does make sense

I am a bit hesitant, but let's say that is fine. To avoid the adhoc method, I think it might be possible to do something similar we have in is_eigen - using overload resolution.

Yeah. I think one solution could be to make all of it's constructors private. Then since var is a private friend it should be the only one able to access them (I think?) if not I can take another crack at the var_view type

I don't see making things hidden enough a problem. We can always declare member functions private or move stuff in internal namespace.

What I mean is things like: How do we make the interface with varis containing references? There are two main groups of accesses to vari (that I see):

a) trough the var (calls to .adj() and .val(), var constructors and assignment operators)
b) from other varis representing operaions that use the particular vari (chain function updating arguments' adjoints)

In both of this cases the accessing class either needs to restricts itself to only accessing virtual functions or knowing exact type in the vari (is it double or double& / plain matrix or a block or a rowwise reversed view ...).

For the Eigen operations we already have rev implementations templated with argument types. I think we need to do the same for scalar ones in order to use the references.

I am not so sure about accesses trough var. It already knows exact type of the vari, but ... does it need to? Right now that makes calls to adj() and val() efficient, but assignments of var<double&> to var<double> need to do a conversion including copying data and setting up a reverse pass callback.

Instead we could have assignments just copy the vari pointer, but adj() and val() would have to be implemented through a call to a virtual function on the vari. Although this probably complicates the implementation of templated operations.

@SteveBronder
Copy link
Collaborator Author

Had to read your comment a few times to get a full grasp of the issue and those are good points.

For the Eigen operations we already have rev implementations templated with argument types. I think we need to do the same for scalar ones in order to use the references.

I think this is v possible. By the Eigen operations do you mean the math ops that just take in an eigen type? Like how we went from Matrix<T, R, C> to just a template parameter EigMat.

Instead we could have assignments just copy the vari pointer, but adj() and val() would have to be implemented through a call to a virtual function on the vari. Although this probably complicates the implementation of templated operations.

This one kind of worries me. I'd have to sit down and play around with this to get a better grasp of it.

For coefficient access, If we want to think about this more why don't we just do the hard copy for now and get rid of the var_value<double&> specialization. Then when we can write some tests and have a better understanding of the problem we can come back to this. If we go forward with the templated argument types I think we would want to do that first before we have var_value<double&> types floating around

For the vari_views I'm not super worried atm. My main thing with those was just to get them up and running so we could write the indexing stuff the compiler needs and eod they will end up copied into a var_value<plain_type_t<T>>.

@SteveBronder
Copy link
Collaborator Author

SteveBronder commented Sep 10, 2020

If we go forward with the templated argument types I think we would want to do that first before we have var_value<double&> types floating around

Oh actually saying that out loud I think we essentially need to do that before we have a vari_value<double&> specialization or something like A[1, 1] * B[2, 2] in a stan program would fail to compile if A and B were var_value<Matrix> types

@t4c1
Copy link
Contributor

t4c1 commented Sep 11, 2020

By the Eigen operations do you mean the math ops that just take in an eigen type?

Yeah, since we are talking reverse, either the scalar is var or the type is var_value.

This one kind of worries me. I'd have to sit down and play around with this to get a better grasp of it.

I think we can start with templating, before trying more virtual calls. Than we can see what needs to be made virtual for this to work and benchmark to see if virtual version is faster or slower.

For coefficient access, If we want to think about this more why don't we just do the hard copy for now and get rid of the var_value<double&> specialization. Then when we can write some tests and have a better understanding of the problem we can come back to this.

Well, just using tem in expression var_value<double&> will sure be faster. If you want to leave this change for another PR or for me, that is fine.

If we go forward with the templated argument types I think we would want to do that first before we have var_value<double&> types floating around

I think it is not that large change so both can be done in the same PR. If you end up doing it, you can structure PRs as you want.

Oh actually saying that out loud I think we essentially need to do that before we have a vari_value<double&> specialization or something like A[1, 1] * B[2, 2] in a stan program would fail to compile if A and B were var_value types

Yes.

@stan-buildbot
Copy link
Contributor


Name Old Result New Result Ratio Performance change( 1 - new / old )
gp_pois_regr/gp_pois_regr.stan 4.22 4.28 0.99 -1.44% slower
low_dim_corr_gauss/low_dim_corr_gauss.stan 0.02 0.02 1.0 0.09% faster
eight_schools/eight_schools.stan 0.09 0.09 1.01 0.61% faster
gp_regr/gp_regr.stan 0.2 0.2 0.97 -3.54% slower
irt_2pl/irt_2pl.stan 5.28 5.28 1.0 -0.03% slower
performance.compilation 88.81 87.57 1.01 1.39% faster
low_dim_gauss_mix_collapse/low_dim_gauss_mix_collapse.stan 8.19 8.2 1.0 -0.12% slower
pkpd/one_comp_mm_elim_abs.stan 27.8 27.43 1.01 1.33% faster
sir/sir.stan 140.53 133.76 1.05 4.82% faster
gp_regr/gen_gp_data.stan 0.05 0.05 1.01 0.68% faster
low_dim_gauss_mix/low_dim_gauss_mix.stan 3.41 3.3 1.04 3.45% faster
pkpd/sim_one_comp_mm_elim_abs.stan 0.39 0.41 0.95 -5.75% slower
arK/arK.stan 2.56 1.83 1.4 28.47% faster
arma/arma.stan 0.73 0.72 1.02 1.61% faster
garch/garch.stan 0.73 0.73 1.0 0.01% faster
Mean result: 1.02920860571

Jenkins Console Log
Blue Ocean
Commit hash: 57217ab


Machine information ProductName: Mac OS X ProductVersion: 10.11.6 BuildVersion: 15G22010

CPU:
Intel(R) Xeon(R) CPU E5-1680 v2 @ 3.00GHz

G++:
Configured with: --prefix=/Applications/Xcode.app/Contents/Developer/usr --with-gxx-include-dir=/usr/include/c++/4.2.1
Apple LLVM version 7.0.2 (clang-700.1.81)
Target: x86_64-apple-darwin15.6.0
Thread model: posix

Clang:
Apple LLVM version 7.0.2 (clang-700.1.81)
Target: x86_64-apple-darwin15.6.0
Thread model: posix

@t4c1
Copy link
Contributor

t4c1 commented Oct 2, 2020

This looks mostly good now. Although I think we should handle the arena matrix by adding Eigen::internal::traits specialization for it instead of the changes to ref_type_t.

You can derive it or use values from traits specialization for the Eigen::Map used in arena matrix. Maybe you can even typedef it to be the same type as the traits specialization for the map.

@stan-buildbot
Copy link
Contributor


Name Old Result New Result Ratio Performance change( 1 - new / old )
gp_pois_regr/gp_pois_regr.stan 3.58 3.54 1.01 1.3% faster
low_dim_corr_gauss/low_dim_corr_gauss.stan 0.02 0.02 0.98 -2.04% slower
eight_schools/eight_schools.stan 0.11 0.12 0.93 -8.11% slower
gp_regr/gp_regr.stan 0.18 0.18 1.0 0.26% faster
irt_2pl/irt_2pl.stan 5.65 5.73 0.99 -1.48% slower
performance.compilation 90.42 88.72 1.02 1.88% faster
low_dim_gauss_mix_collapse/low_dim_gauss_mix_collapse.stan 9.07 9.27 0.98 -2.23% slower
pkpd/one_comp_mm_elim_abs.stan 28.84 28.72 1.0 0.4% faster
sir/sir.stan 134.96 138.91 0.97 -2.93% slower
gp_regr/gen_gp_data.stan 0.05 0.04 1.01 0.62% faster
low_dim_gauss_mix/low_dim_gauss_mix.stan 3.18 3.22 0.99 -1.07% slower
pkpd/sim_one_comp_mm_elim_abs.stan 0.38 0.37 1.03 2.93% faster
arK/arK.stan 2.56 1.84 1.39 27.99% faster
arma/arma.stan 0.74 0.75 0.98 -1.54% slower
garch/garch.stan 0.66 0.63 1.05 5.01% faster
Mean result: 1.02207932907

Jenkins Console Log
Blue Ocean
Commit hash: 9d70799


Machine information ProductName: Mac OS X ProductVersion: 10.11.6 BuildVersion: 15G22010

CPU:
Intel(R) Xeon(R) CPU E5-1680 v2 @ 3.00GHz

G++:
Configured with: --prefix=/Applications/Xcode.app/Contents/Developer/usr --with-gxx-include-dir=/usr/include/c++/4.2.1
Apple LLVM version 7.0.2 (clang-700.1.81)
Target: x86_64-apple-darwin15.6.0
Thread model: posix

Clang:
Apple LLVM version 7.0.2 (clang-700.1.81)
Target: x86_64-apple-darwin15.6.0
Thread model: posix

@SteveBronder
Copy link
Collaborator Author

This looks mostly good now. Although I think we should handle the arena matrix by adding Eigen::internal::traits specialization for it instead of the changes to ref_type_t.

So I had a go at this before I did the ref_type specialization and ended up going down quite a rabbit hole!. Once I defined traits for arena_matrix I then had to add a bunch of other "trait" like specializations for things like unary_evaluator and that started making things super complicated. Since arena_matrix just holds a map to data allocated in our arena I prefer this approach for now. I know you know more about Eigen's internals, would you have time to do this in another PR? Though imo I think just saying "arena_matrix types can be treated like they are plain objects" is fine atm since we don't have arena's holding expressions.

@SteveBronder SteveBronder mentioned this pull request Oct 3, 2020
5 tasks
@t4c1
Copy link
Contributor

t4c1 commented Oct 5, 2020

If that is the case, no need to specialize all the EIgen internal structures. However the current implementation could be buggy in some edge cases. Can you instead make a ref_type and ref_type_for_opencl specializations that will just be equivalent to specialzations for the map, arena_matrix is derived from?

* @ingroup type_trait
*/
template <typename T>
struct is_eigen_map_base
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is not used anywhere. Remove it.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Gah! lol I was doing something else and slammed this in there

@stan-buildbot
Copy link
Contributor


Name Old Result New Result Ratio Performance change( 1 - new / old )
gp_pois_regr/gp_pois_regr.stan 3.19 3.25 0.98 -1.9% slower
low_dim_corr_gauss/low_dim_corr_gauss.stan 0.02 0.02 0.98 -1.9% slower
eight_schools/eight_schools.stan 0.12 0.12 1.0 0.45% faster
gp_regr/gp_regr.stan 0.18 0.18 1.01 0.66% faster
irt_2pl/irt_2pl.stan 5.69 5.67 1.0 0.4% faster
performance.compilation 90.97 89.64 1.01 1.46% faster
low_dim_gauss_mix_collapse/low_dim_gauss_mix_collapse.stan 9.24 9.32 0.99 -0.89% slower
pkpd/one_comp_mm_elim_abs.stan 29.5 29.24 1.01 0.9% faster
sir/sir.stan 126.13 128.15 0.98 -1.61% slower
gp_regr/gen_gp_data.stan 0.04 0.04 1.0 0.13% faster
low_dim_gauss_mix/low_dim_gauss_mix.stan 3.25 3.2 1.02 1.5% faster
pkpd/sim_one_comp_mm_elim_abs.stan 0.38 0.41 0.91 -10.18% slower
arK/arK.stan 2.57 1.85 1.39 27.86% faster
arma/arma.stan 0.75 0.75 1.0 -0.22% slower
garch/garch.stan 0.65 0.63 1.03 3.11% faster
Mean result: 1.02115765382

Jenkins Console Log
Blue Ocean
Commit hash: 4920bce


Machine information ProductName: Mac OS X ProductVersion: 10.11.6 BuildVersion: 15G22010

CPU:
Intel(R) Xeon(R) CPU E5-1680 v2 @ 3.00GHz

G++:
Configured with: --prefix=/Applications/Xcode.app/Contents/Developer/usr --with-gxx-include-dir=/usr/include/c++/4.2.1
Apple LLVM version 7.0.2 (clang-700.1.81)
Target: x86_64-apple-darwin15.6.0
Thread model: posix

Clang:
Apple LLVM version 7.0.2 (clang-700.1.81)
Target: x86_64-apple-darwin15.6.0
Thread model: posix

@stan-buildbot
Copy link
Contributor


Name Old Result New Result Ratio Performance change( 1 - new / old )
gp_pois_regr/gp_pois_regr.stan 3.14 3.15 1.0 -0.44% slower
low_dim_corr_gauss/low_dim_corr_gauss.stan 0.02 0.02 0.93 -7.29% slower
eight_schools/eight_schools.stan 0.12 0.13 0.87 -14.63% slower
gp_regr/gp_regr.stan 0.17 0.18 0.97 -2.62% slower
irt_2pl/irt_2pl.stan 5.72 5.61 1.02 1.89% faster
performance.compilation 91.6 89.35 1.03 2.46% faster
low_dim_gauss_mix_collapse/low_dim_gauss_mix_collapse.stan 9.08 9.18 0.99 -1.16% slower
pkpd/one_comp_mm_elim_abs.stan 29.04 29.08 1.0 -0.14% slower
sir/sir.stan 125.87 132.97 0.95 -5.64% slower
gp_regr/gen_gp_data.stan 0.04 0.04 1.0 0.37% faster
low_dim_gauss_mix/low_dim_gauss_mix.stan 3.18 3.24 0.98 -1.69% slower
pkpd/sim_one_comp_mm_elim_abs.stan 0.38 0.39 0.98 -2.14% slower
arK/arK.stan 2.56 1.82 1.4 28.76% faster
arma/arma.stan 0.75 0.76 0.99 -1.3% slower
garch/garch.stan 0.66 0.63 1.06 5.43% faster
Mean result: 1.01113651891

Jenkins Console Log
Blue Ocean
Commit hash: 4837a06


Machine information ProductName: Mac OS X ProductVersion: 10.11.6 BuildVersion: 15G22010

CPU:
Intel(R) Xeon(R) CPU E5-1680 v2 @ 3.00GHz

G++:
Configured with: --prefix=/Applications/Xcode.app/Contents/Developer/usr --with-gxx-include-dir=/usr/include/c++/4.2.1
Apple LLVM version 7.0.2 (clang-700.1.81)
Target: x86_64-apple-darwin15.6.0
Thread model: posix

Clang:
Apple LLVM version 7.0.2 (clang-700.1.81)
Target: x86_64-apple-darwin15.6.0
Thread model: posix

@SteveBronder SteveBronder merged commit 50d56cc into develop Oct 7, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants