-
Notifications
You must be signed in to change notification settings - Fork 27
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Static matrices #21
Static matrices #21
Conversation
designs/0005-static-matrices.md
Outdated
|
||
```stan | ||
static_matrix[N, M] A = // Fill in A... | ||
A[10, 10] = 5; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
[optional] You can improve clarity by adding a comment that this operation is illegal.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The "subsets of the matrix" is confusing. There are blocks, elements, etc., but "subset" isn't well defined here. I think what you mean is that we'll be able to assign whole matrices, but won't be able to have any indexing or slicing/blocking on the LHS of an assignment. I'm not sure how best to say that concisely.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
"Fill in A" is confusing. We can have static_matrix[N, M] A = expression
wherever expression
is of type static_matrix
or of type matrix
. That is, we should allow the assignment of a regular matrix to a static matrix and vice-versa.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
"Fill in A" is confusing
How about "Construct A"?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The "subsets of the matrix" is confusing. There are blocks, elements, etc., but "subset" isn't well defined here.
I can't think of working here that's better than subset. A subset of a matrix is just a slice, block. element, etc.
I think what you mean is that we'll be able to assign whole matrices, but won't be able to have any indexing or slicing/blocking on the LHS of an assignment. I'm not sure how best to say that concisely.
I'd actually rather not allow assignment of whole matrices after first declaration specifically because of the naming issues we've had on this design doc. imo I'd rather keep it very simple and then we can be fine calling it const matrix
or fixed_matrix
. Saying, "This is constant except when you assign to the entire matrix" also feels a bit odd. It also fixes the above definition to
At the language level, a
const matrix
can replace amatrix
type if the matrix is only constructed once and never reassigned to.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That's what is great about 'whole matrix' and 'whole vector' as types. The error message is, 'you can only assign the whole matrix/whole vector!'.
I think submatrix is the thing. I guess a subset of a matrix could be element (1, 1), element (7, 5) and any values < 3. There's a definition on Wolfram so good enough!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm OK with "submatrix" but not with "subset" because these objects are not subsets.
Saying, "This is constant except when you assign to the entire matrix" also feels a bit odd.
The C distinction between a constant pointer and a pointer to a constant has been confusing programmers for years and I can see how trying to make this kind of subtle distinction about const
would be confusing. That's why I don't want to call it "const"!
designs/0005-static-matrices.md
Outdated
|
||
Besides that they can be thought of as standard matrix. | ||
|
||
At the Stan Math level a `static_matrix` is a `var_type<Eigen::Matrix<double, -1, -1>>` with an underlying pointer to a `vari_type<Eigen::Matrix<double, -1, -1>>`*. This means that accessors for the value and adjoint of `var_type` objects can access contiguous chunks of each part of the `var_type`. Any function that accept a `var_type<T>` will support static matrices. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
[optional] I would prefere var_template
and vari_template
to var_type
and vari_type
. All these are types anyway. However they differ from var
and vari
in the fact that they are templates.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have the same issue with template
---everything's a template. I'm not sure what to call this, though. You probably need the design for var_whatevs
in here too.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Naming things, I'm gonna rattle off a whole bunch here and see what sticks
var_expr
ad_type
ad_pack
var_value
ad_value
var_t
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
everything's a template
Not really. var
and vari
as they are implemented in develop are not templates. Even with the proposed changes they would not be templates - they would be template instatinations. So I still think var_template
is the best way to differentiate that from existing var
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What about just var_t
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We already use _t
for metaprograms returning a type ...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Even with the proposed changes they would not be templates
Is there any downside to just making them templates?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh I see: stan-dev/math#1805 (comment)
designs/0005-static-matrices.md
Outdated
With `sparse_matrix` and `complex_matrix` this will now add _another_ `*_matrix` type proposal to the language. There's no way to get around this in the Stan language currently, though some small side discussions exist to support attributes on types such as | ||
|
||
```stan | ||
@(sparse_type, static_type) // can't have static here because of C++ keyword |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
// can't have static here because of C++ keyword
Can you explain why not?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That won't get in the way of our code---we just don't want conflicting things code generated.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh my b I thought any C++ reserved keywords are banned. Not sure how we currently look for them though it may be good to avoid them so that the compiler doesn't have to check for "No C++ reserved keywords except for in these specific places"
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I like const
more than static
. const
just implies you shouldn't be assigning. static
is more like defining the scope of a variable (in Stan, it seems like static
data would be the stuff that lives for the whole length of a chain, not just a gradient evaluation).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh whoops I see this has already been said my bad.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
-
We already use
for
andif
andint
andwhile
andreturn
andcontinue
. It's really just identifiers. That's the only place users can define variables and we don't want users defining variables that overlap with C++ or Stan keywords. -
I agree that
const
means you shouldn't be assigning. But that's not what's going on here. We can assign, just not to individual elements.static_matrix a = b * c; a = a * d; // OK to reassign a[1, 2] = 3; // ILLEGAL to assign to elements
Reassigning still maintains memory locality. Assigning to elements does not.
designs/0005-static-matrices.md
Outdated
# Summary | ||
[summary]: #summary | ||
|
||
This proposes a `static_matrix` and `static_vector` type in the Stan language which on the backend will allow for significant performance opportunities. In Stan Math this will be represented by `var<Eigen::Matrix<double, 1, 1>>` or `var<matrix_cl<double>>`. The implementation will be fully backwards compatible with current Stan code. With optimizations from the compiler the conditions for a `matrix` to be a `static_matrix` can be detected and applied to Stan programs automatically for users. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So this is also extending stan language? That was not previously clear form the issue. In that case I would propoese to add const
keyword instead. That would be consistent with C++.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think they're quite const
, because you could reassign a static_matrix
(or static_vector
) variable, as long as the thing being assignedis itself a static matrix (or vector).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What about static_row_vector
?
Also, this is going to get confusing when we have complex_matrix
, because then we'll have matrix
, static_matrix
, complex_matrix
, and static_complex_matrix
. So I'm also wondering like @t4c1 if there's some way we can make this more combinatorial at the syntax level. Maybe something like static matrix
. Then we could apply the same concept to arrays.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So this is also extending stan language? That was not previously clear form the issue. In that case I would propoese to add const keyword instead. That would be consistent with C++.
I'm half and half on extending the language for this. I wouldn't mind having this as something that strictly the compiler would look for and automate
What about static_row_vector?
Yes I cover this below. Maybe something like
fixed complex matrix[N, M] A = // var<Eigen::Matrix<complex<double>, -1, -1>>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think they're quite const, because you could reassign a static_matrix (or static_vector) variable, as long as the thing being assigned is itself a static matrix (or vector).
How would you feel about making them const tho? We could change the definition I wrote above so that you can't assign after the first declaration.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think that would be just an arbitrary restriction. In other words no reason to do it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In other words no reason to do it
Well there is a reason to do it -- it makes the naming simple :P.
But I think this breaks with arrays.
const matrix[N, M] matrices[P];
Is there an autodiff-themed name for this? We're really doing it for autodiff purposes.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I like fixed_matrix
, but that implies a new type rather than a new annotation. I think that'll be simpler in the end, even if it may seem like more combinatorics to us (I say "seem" because I'm not sure it will be more combinatorics in the implementation given that "const" is very peculiar in meaning here).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmm, maybe there needs to be an annotation/attribute design doc before we decide on this piece. We can use that for complex and sparse. That would probably be a big enough language feature to do the official stanc3
designs/0005-static-matrices.md
Outdated
# Motivation | ||
[motivation]: #motivation | ||
|
||
Currently, a differentiable matrix in Stan is represented as an Eigen matrix holding a pointer to an underlying array of autodiff objects, each of those holding a pointer to the underlying autodiff object implementation. These `N*M` autodiff elements are expensive but necessary so the elements of a matrix can be assigned to without copying the entire matrix. However, in instances where the matrix is treated as a whole object such that underlying subsets of the matrix are never assigned to, we can store one autodiff object representing the entire matrix. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Before we finalize the static matrix design doc I think we should discuss a bit the part of the optimizations that also apply to non-static matrix. Namely what I suggested here: stan-dev/math#1805 (comment)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do you mean
I think the first suggestion could be simplified a bit by writing a specialization for Eigen::Matrix. Also we can make non-static version with second member Eigen::Matrix<vari*, -1, -1> adj_;. This is not as efficient as what you suggested, but can be modified.
I can dig the route of a specialization for Eigen::Matrix<var>
. Though what I'm confused about is, if someone wants to dynamically assign parts of a matrix why wouldn't they just use the current matrix type?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@t4c1 if we add Eigen::Matrix<var>
what would the internal Eigen representation be? And would we have to write specializations for other Eigen types? I kind of like the var_t<T>
approach since that would also work for sparse matrices (and probably with some other stuff matrix_cl<>`)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I can dig the route of a specialization for Eigen::Matrix
Sorry, that was not the part I wanted to point out here.
if someone wants to dynamically assign parts of a matrix why wouldn't they just use the current matrix type?
I think we may be able to remake it in a way that would be as efficient as static matrix you are proposing here.
if we add Eigen::Matrix what would the internal Eigen representation be?
It would not actually be Eigen type. I was just thinking we could specialize it this way and still implement it the same way as you propose for var<MatrixXd>
to allow us to keep most functions accepting it the same. Am not sure anymore if that was such a great idea.
Also we can make non-static version with second member
Eigen::Matrix<vari*, -1, -1> adj_
That was the part I actually wanted to point out. Although after some more thinking I am not so sure whether this is the best way.
Maybe we could instead implement changing a part of what you are suggesting as a static matrix to have it completely replace Matrix<var>
? I think we would have to change some bits of how chain stack works ...
Just some brainstorming. I need to think about this some more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we may be able to remake it in a way that would be as efficient as static matrix you are proposing here
Oh like replacing the definition of matrix[N, M]
in the language? imo I think I'd rather leave that to another proposal. Though if it's something simple then I'd be for adding it here. I was kind of confused by the dynamic case you and Bob were talking about. Would we be able to do something simple like a specialization of var_type
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh like replacing the definition of matrix[N, M] in the language?
No. Just replacing Eigen::Matrix<var>
in Stan Math. Language needs not to change.
imo I think I'd rather leave that to another proposal.
Sure. I was just thinking that if we can do that there is no need for changes in language in this design doc.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@t4c1 --- I didn't undestand the Matrix<vari*, -1, -1>
plan. That's equivalent to Matrix<var, -1, -1>
in terms of memory and how it executes (var
has no virtual methods and a single member vari*
, so it's memory-equivalent to vari*
).
There's no way to completely replace Matrix<var, -1, -1>
with fixed matrices without giving up the ability to assign (that's what I believe JAX and PyTorch do---give up on matrix assignment). As soon as you assign to an element, you need a much more flexible data structure that loses memory locality. Now we do want to be able to assign with something like a comprehension or function (say for a covariance matrix for a GP).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I didn't undestand the Matrix<vari*, -1, -1> plan.
It was just an idea. Not the best one for exactley the reasons you are pointing out.
There's no way to completely replace Matrix<var, -1, -1> with fixed matrices without giving up the ability to assign
Maybe we don't have to give it up. I haven't made any prototypes, so I'm not 100% sure, but I think we may be able to implement assignment for the proposed change to the matrix. It would be more complicated and slower than the assignment currently is - but we are getting speedups elswhere anyway. So if it it turns out it is possible, there is no reason for any changes to the language - we just replace the matrix in Stan Math.
Need to digest this a bit more, but a quick question for now. Is it necessary to add a new Stan language type or could this be just a stanc optimization step? I am not opposing this, just wondering. That would simplify the implementation I guess? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for writing this up. Here are some initial thoughts.
designs/0005-static-matrices.md
Outdated
# Summary | ||
[summary]: #summary | ||
|
||
This proposes a `static_matrix` and `static_vector` type in the Stan language which on the backend will allow for significant performance opportunities. In Stan Math this will be represented by `var<Eigen::Matrix<double, 1, 1>>` or `var<matrix_cl<double>>`. The implementation will be fully backwards compatible with current Stan code. With optimizations from the compiler the conditions for a `matrix` to be a `static_matrix` can be detected and applied to Stan programs automatically for users. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think they're quite const
, because you could reassign a static_matrix
(or static_vector
) variable, as long as the thing being assignedis itself a static matrix (or vector).
designs/0005-static-matrices.md
Outdated
# Summary | ||
[summary]: #summary | ||
|
||
This proposes a `static_matrix` and `static_vector` type in the Stan language which on the backend will allow for significant performance opportunities. In Stan Math this will be represented by `var<Eigen::Matrix<double, 1, 1>>` or `var<matrix_cl<double>>`. The implementation will be fully backwards compatible with current Stan code. With optimizations from the compiler the conditions for a `matrix` to be a `static_matrix` can be detected and applied to Stan programs automatically for users. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What about static_row_vector
?
Also, this is going to get confusing when we have complex_matrix
, because then we'll have matrix
, static_matrix
, complex_matrix
, and static_complex_matrix
. So I'm also wondering like @t4c1 if there's some way we can make this more combinatorial at the syntax level. Maybe something like static matrix
. Then we could apply the same concept to arrays.
designs/0005-static-matrices.md
Outdated
|
||
```stan | ||
static_matrix[N, M] A = // Fill in A... | ||
A[10, 10] = 5; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The "subsets of the matrix" is confusing. There are blocks, elements, etc., but "subset" isn't well defined here. I think what you mean is that we'll be able to assign whole matrices, but won't be able to have any indexing or slicing/blocking on the LHS of an assignment. I'm not sure how best to say that concisely.
designs/0005-static-matrices.md
Outdated
|
||
```stan | ||
static_matrix[N, M] A = // Fill in A... | ||
A[10, 10] = 5; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
"Fill in A" is confusing. We can have static_matrix[N, M] A = expression
wherever expression
is of type static_matrix
or of type matrix
. That is, we should allow the assignment of a regular matrix to a static matrix and vice-versa.
designs/0005-static-matrices.md
Outdated
A[10, 10] = 5; | ||
``` | ||
|
||
Besides that they can be thought of as standard matrix. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What does this mean? In a spec, you want to be more concrete. I think the notion we want is assignability. If A
is assignable to B
, then we want to make sure that anywhere B
can be used, A
can also be used.
designs/0005-static-matrices.md
Outdated
With `sparse_matrix` and `complex_matrix` this will now add _another_ `*_matrix` type proposal to the language. There's no way to get around this in the Stan language currently, though some small side discussions exist to support attributes on types such as | ||
|
||
```stan | ||
@(sparse_type, static_type) // can't have static here because of C++ keyword |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That won't get in the way of our code---we just don't want conflicting things code generated.
designs/0005-static-matrices.md
Outdated
# Rationale and alternatives | ||
[rationale-and-alternatives]: #rationale-and-alternatives | ||
|
||
Besides the tech burden I'm rather happy with this implementation, though I'm open to any criticisms/alternatives and will add them here. You can see some of the discussion on this as well in issue [#1805](https://github.com/stan-dev/math/issues/1805) and the prior art below. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You don't need to include your personal feelings about it. Presumably you're OK with this if you're proposing it.
If there's relevant discussion, include it here in summary form. Nobody's going to click through issues ot figure out what you mean here.
designs/0005-static-matrices.md
Outdated
|
||
- What parts of the design do you expect to resolve through the RFC process before this gets merged? | ||
|
||
I'm interested in hearing about Stan language semantics and any difficulties I may have missed during the implementation section. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This should all be taken out of the first person. I don't know why the guidelines are written that way!
It goes without saying tha tyou want to hear about any issues, so you can drop this. If you don't have something specific to say in a spec, don't say it.
designs/0005-static-matrices.md
Outdated
|
||
- What related issues do you consider out of scope for this RFC that could be addressed in the future independently of the solution that comes out of this RFC? | ||
|
||
Any intricacies of using the GPU via `var_type<matrix_cl<double>>` should be deferred to a separate discussions or can happen during the implementation. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is a great example of a clear and precise issue that's written in third person!
designs/0005-static-matrices.md
Outdated
- [A New Continuation Based Autodiff By Refactoring](https://discourse.mc-stan.org/t/a-new-continuation-based-autodiff-by-refactoring/5037/2) | ||
|
||
[Enoki](https://github.com/mitsuba-renderer/enoki) is a very nice C++17 library for automatic differentiation which under the hood can transform their autodiff type from an arrays of structs to structs of arrays. It's pretty neat! Though implementing something like their methods would require some very large changes to the way we handle automatic differentiation. | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
JAX is the major system that's built like this, though I think PyTorch may also use a static design. As far as I know, they only have static matrices.
Thanks for the pointer to Enoki. I don't know that one (though I know the mushroom).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good call I'll link to these
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@bob-carpenter can you point me to any docs from jax / pytorch that talk about the sort of static matrix type they have? It looks like in pytorch they mainly use tensors and allow for subsetting
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For JAX:
https://jax.readthedocs.io/en/latest/notebooks/Common_Gotchas_in_JAX.html#🔪-In-Place-Updates
I love that they literally put a knife in the URL for the "sharp bits" of the doc.
There's no reason not to use a slice, block, or element of a matrix as an rvalue---the problems start when you assign to a matrix element. That's why it's OK to assign whole matrices.
Not allowing assignment of whole matrices means that transformed data, transformed parameters, and generated quantities need to be declare-defined in one line, which means they can't depend on any other calculations in the blocks they are declared in. For example, the following code pattern becomes impossible.
transformed parameters {
static_matrix a[3, 2] x;
T1 b;
T2 c;
... some calculations involving b and c ...
x = ... some function of those calculations
}
Instead, b and c need to be declare-defined before x and a function defined to call it,
matrix b = foo();
matrix c = bar(b);
matrix x = f(b, c);
and often it means we can't share computations that we'd like to share for calculating c or x. At least that's been the problem in the past with defining sizes, which have this constant character.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Added JAX doc to the prior art!
We are pushing the language discussion out of this design for now but seeing this I'm get this now
Yes we spoke about that in the issue. Personally I'm fine with this just being a compiler optimization. Though Bob suggested allowing users to declare them and I'm also fine with that. Though yes it adds complexity to the spec |
This all looks really cool...my only comment would be to really make sure with prototypes that this is worth the effort which appears to be huge. (But I would think that this is on the right track) One Q: Will the AD tape change or will that still be the same and we just put different stuff there? |
Yeah I think we can make an easy prototype just by doing steps (1) and (2), then just taking a stan program that does matrix stuff now and changing stuff to use the static matrix version. There's def work here, but with the backwards compatibility part it can be done pretty iteratively
Since that |
The stack is definitely going to need to change. Right now, it assumes vari objects with values and adjoints. There's no way to build a single stack that'll hold either vari or vari because that's two different types. The current target for specialiing There's no doubt there's a huge speed improvement to be had from dealing with large matrices in matrix form and not having to walk over them to put them together and take them apart. I agree it'd be nice to know how big that speedup is. But that'd be easy to profile. For instance, in autodiff now, we can run Stan, and for the new thing, it's just going to be some matrix multiplies of Edit: Specifically this:
We need a single type there, not a template or we'll have multiple stacks. So we need to go up a level in abstraction to something like |
@bob-carpenter @wds15 What if we had all using ChainableStack = AutodiffStackSingleton<vari_base, chainable_alloc>; Then it would just do the polymorphic thing we currently do |
Hmm, there's already one layer of polymorphism here. Things on the stacks already inherit from vari (https://github.com/stan-dev/math/blob/develop/stan/math/rev/core/dv_vari.hpp). So there would be two levels of indirection there. If the things that inherit from vari also become responsible for:
Does that also work? Now I'm just confused thinking about how I'd write an operator that accepts a matrix of vars. I'd have to write that along with the version that accepts the var matrix, and the code would be different. A side comment, I'm not sure values and adjoints should be the same type. It seems entirely plausible we'd use float calculations in the process of computing a double result, and I think we'd want to keep the double precision in the adjoint as we propagate it back through. |
We can assign, just not to individual elements.
I’ve been uncomfortable with the names being thrown around because none
of them capture this defining behavior. “Monolithic” is often used to describe
the whole rather than the elements, but “encapsulated” is more closer to the
element -> whole construction. I like “encapsulated” but it is a big ungainly,
although “encap” is a pretty unambiguous shortening.
|
The answer to @bbbales2 last question is @SteveBronder's last response :-)
That's right.
Why? The place we get into trouble with
|
I think there's some confusion, I fixed up this goldbolt example to be closer to what I'm talking about. There will still only be one level of polymorphism
I'd like to talk about this more but it should be not on this design doc. In short that's a thing we can test when we get the base implementation of this up and running
idk about Ugh, naming things. I would be so fine with having a silent auction where users put money on what they want the name to be and we end up with a transformed parameters {
const matrix[N, M] A = B * C;
A = B * transpose(B); // NOPE!
} @bob-carpenter that second assignment of |
Right now in Stan, when we do
we get a new matrix allocated for But what I was saying is that we could save the rows and columns spec, then we can re-use the LHS memory as you are suggesting. Attributes of the Java variety (things decorated with |
Oh yeah that looks fine. So now base vari does absolutely nothing other than polymorph.
Makes sense. This is useful regardless of whether we try to do floats at some point. |
|
I got a version of this up for addition and multiplication today that passes all the tests for https://github.com/stan-dev/math/compare/feature/var-template git checkout feature/var-template
./runTests.py ./test/unit/math/rev/core/static_matrix_checking_test.cpp There's a couple weirds I'm doing here and there since this is an mvp. But it seems to work with the stack allocator nicely. Shout out to @bbbales2 for the help! |
Any idea what the speed improvement is for these operations?
…On Tue, May 5, 2020 at 8:31 PM Steve Bronder ***@***.***> wrote:
I got a version of this up for addition and multiplication today that
passes all the tests for var, works for var_type<Eigen::Matrix<double,
-1, -1>>, and has the static and dynamic version gradient calculations
matching up for a simple example. I think it could work for
var_type<matrix_cl<double>> if we add += operator and some other
functions for matrix_cl types that we don't have rn
https://github.com/stan-dev/math/compare/feature/var-template
git checkout feature/var-template
./runTests.py ./test/unit/math/rev/core/static_matrix_checking_test.cpp
There's a couple weirds I'm doing here and there since this is an mvp. But
it seems to work with the stack allocator nicely. Shout out to @bbbales2
<https://github.com/bbbales2> for the help!
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
<#21 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/AAZ2D76JXTEKRQLDUCIG323RQCVXRANCNFSM4MW2J4AA>
.
|
Yes just finished and they are pretty cool! The benchmarks here are for static ( Trying to parse out why it's so much faster I have a few ideas
The only bad side I need to add here is that for each of the matrix operators like multiply we need all their matrix/vector/scalar derivatives which I forgot about and will add more work here. Though I'd argue with the speedups above the pain of that is very worth the gain! |
This is awesome! This looks like it is finally the solution that will speed up the simple vectorized operations that we haven't managed to get before! What's the next step in getting this finished up? Do we need to finalize the Stan interface or do you want comments on the current C++?
I don't think anything changes here compared to the old code, at least for reverse mode. What is the source of this worry? |
Ty! Yeah this made me very excited. Once we add this no one is ever going to benchmark against us with a logistic regression lol
I think we need to finish the stan interface and a quick review of the basic implementation would be good.
I was getting the wrong answers when just applying the old code. I had to go to matrixcalculus.org and add the |
Lol, well, it's a compliment to be the reference.
I'll look over the C++ later tonight. Re-interface, I liked the 'whole' naming scheme Bob proposed. I wonder what the implications of a decorator would be though (if that would be somehow easier to work with in Stan code).
Oooh, okay, well whatever we write goes through the autodiff tester, so I don't see any reason to worry. We'll catch these things. |
I think the plot's right, but the x-axis labels are wrong---those can't be log N because they go up to 1000. In general, I prefer the way you did it---log scale axes, plain scale labels. I would guess the decreasing gains from larger N with mulitply it's because the copy is always O(N^2) whereas matrix multiply is O(N^3), so when N gets big, the O(N^3) starts dominating. Matrix addition is still O(N^2), which is presumably why you see much bigger gains there. GLMs involve a double-based matrix multiplied by a vector of autodiff variables, so that's an O(N) copy operation for the vector and O(N^2) multiplication. |
Yes apologies it's supposed to be N not log N. It's just spaced out on a log10 scale.
Nice! |
Yes I think we got lapped on how to speed up GLMs in Stan/handle the memory bottleneck stuff lol. I'd basically given up writing custom autodiff for these things -- it seemed like the virtual function calls just weren't that expensive. Apparently the pointer chasing was the baddie! |
Memory is a second reason to write custom autodiff for matrices. Just throwing matrices into scalar autodiff is a real memory hog. |
Word! I'll update this design doc with the comments from the discussion today and we can put it up for review. |
The optimizations for the GLM functions could also be applied with a static matrix type. It's win-win. |
Is this all in reference to the |
There's a third stack for storing objects whose constructors need to be called after autodiff. That's where |
@SteveBronder reached out and asked me to comment. My initial thought (at the version that's up now) is that it looks great and sensible. For the Math library, I think it'll come down to the details. This deep down in the autodiff stack it's ok to be complicated, but it does need to be clear and maintainable by the team. I think this proposal falls into that category. The one thing I'd suggest before getting too deep: check the timing of non-matrix operations. In the benchmarks posted, there wasn't a comparison of how things behaved outside of this change. It's something I'd do sooner rather than later, just to make sure we're not pursuing something that isn't viable. If this change is equal in performance or faster than the existing autodiff implementation, then it's easy to justify change. If it's a lot slower, then it's time to look for a different implementation. If it's a bit slower, then it's a hard decision where we have to weigh the tradeoff. (I'm intentionally being vague here. We don't have hard and fast rules about how much slower we're willing to get as we make improvements. Ideally we're always in the faster branch, but that isn't always realistic.) |
We can't. Probably. Why would we need to? Unless the matrices are small, memory locality is not a problem. The only downside is the need to call those destructors. |
Right. But I think almost all of the non-data matrices we'll use will be small. At least outside of a GP context. |
I guess back to this, why can't we move it? If we have a move constructor for our |
I think it would only be an issue if their size was comparable or smaller to cache line size. I believe most matrices in stan models will be significantly larger than that.
In this case we could. I was, however, thinking about a different scenario - constructing a |
They're certainly not usually that small---that'd only be 2 x 3 max! I don't think replacing everything with a |
Okay it looks like I need to change my impl to use the object stack that gets deleted. ftr I think we are all good with the concept of static matrices here but I want to wait to ask for approval till we have a base implementation done and tested a bit |
It doesn't get deleted, it just calls the destructors. For an RAII object like an |
Whups yes that's what I meant |
+1, the optimization you describe should be easy once the math signatures are in place. |
Is this okay to merge? I think we have everything sorted out |
The doc is still written in terms of Are there even going to be any language level indicators on these variables at this point? Or is it all happening magically under the hood? I'd also really like the name to change from |
The doc is just for the math library so we are not finalizing anything in the language here (that's going to be separate design doc)
Until we sort out what to name these they should just be a compiler thing |
I'm not attached to any name at all ftr |
Okay makes sense. I can give this a review tomorrow. I think you're right that the picture in math is complete enough we should wrap this up. |
Ty! Before you review let me read this over once more tmrw to make sure everything is lined up with how we are doing stuff in math and I'll ping you when it's ready for another look |
@bbbales2 Just updated if you have a minute feel free to look over! |
Nice! Will do. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This needs updated. It is quite behind what we're working on.
designs/0005-static-matrices.md
Outdated
# Summary | ||
[summary]: #summary | ||
|
||
This proposes a constant matrix type in stan that will allow for significant performance opportunities. In Stan Math this will be represented by `var<Eigen::Matrix<double, R, C>>` or `var<matrix_cl<double>>`. The implementation will be fully backwards compatible with current Stan code. With optimizations from the compiler the conditions for a `matrix` to be a constant matrix can be detected and applied to Stan programs automatically for users. The implementation of this proposal suggests a staged approach where a stan language level feature is delayed until type attributes are allowed in the language and constant matrices have support for the same set of methods that the current `matrix` type has. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
var<T>
-> var_value<T>
"With optimizations from the compiler the conditions for a matrix
to be a constant matrix can be detected and applied to Stan programs automatically for users." -> what we've ended up working on is purely the math implementation stuff. I wouldn't mind if this design-doc shrunk to just the math stuff. At the language we'll either handle stuff automatically or introduce new types -- and likely both, but there will be some other solution.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The only mentions of the compiler/language here are to say that this design doc is just focused on the math portion. The only other example is to show what things could be deduced as static by the compiler. Should I change that one just to show how to rewrite that particular model for static matrices?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Lemme just open up a pull request to this with the changes I'm talking about and you can tell me if it's overboard. I think that'll be easier.
designs/0005-static-matrices.md
Outdated
# Motivation | ||
[motivation]: #motivation | ||
|
||
Currently, a differentiable matrix in Stan is represented as an Eigen matrix holding a pointer to an underlying array of autodiff objects, each of those holding a pointer to the underlying autodiff object implementation. These `N*M` autodiff elements are expensive but necessary so the elements of a matrix can be assigned to without copying the entire matrix. However, in instances where the matrix is treated as a whole object such that underlying subsets of the matrix are never assigned to, we can store one autodiff object representing the entire matrix. See [here](https://github.com/stan-dev/design-docs/pull/21#issuecomment-625352581) for performance tests on a small subset of gradient evaluations using matrices vs. constant matrices. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
"a differentiable matrix" -> "a MxN matrix" or NxM or whichever you prefer.
designs/0005-static-matrices.md
Outdated
# Guide-level explanation | ||
[guide-level-explanation]: #guide-level-explanation | ||
|
||
At the Stan Math level a constant matrix is a `var_value<Eigen::Matrix<double, -1, -1>>` with an underlying pointer to a `vari_value<Eigen::Matrix<double, -1, -1>>`\*. This means that accessors for the value and adjoint of `var_value` objects can access contiguous chunks of each part of the `vari_value`. Any function that accepts a `var_value<T>` will support static matrices. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This means that accessors
We define the meaning. We're writing the code. "The value and adjoint of the matrix are stored separately in memory pointed to by the vari_value<...>"
Any function that accepts a
var_value<T>
will support static matrices
"Functions that currently support Eigen::Matrix<var, R, C>
types will be extended to support var_value<Eigen>
types"
designs/0005-static-matrices.md
Outdated
|
||
At the Stan Math level a constant matrix is a `var_value<Eigen::Matrix<double, -1, -1>>` with an underlying pointer to a `vari_value<Eigen::Matrix<double, -1, -1>>`\*. This means that accessors for the value and adjoint of `var_value` objects can access contiguous chunks of each part of the `vari_value`. Any function that accepts a `var_value<T>` will support static matrices. | ||
|
||
At the language and compiler level, a `matrix` can be substituted for a constant matrix if the matrix is only constructed once and it's subslices are never assigned to. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is the bit we can ditch for now imo. I think it was fair to worry about the language design at first, but I think there's a clear enough interface between how math and the language will work for this that we don't need to worry about the language decisions now.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I modified this just to talk about the compiler which I think is fair and a thing we are shooting for. I think the important thing is to make sure this doesn't cover anything we want in the language which will be a separate design doc
|
||
This feature will be added in the following stages. | ||
|
||
1. Replace `var` and `vari` with templated types. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This part needs updated.
Vaguely we have:
-
var has been replaced by var_value in some branch
-
var_value and var_value types exist
-
We agreed in a math meeting to develop this in pieces. What this means is pull requests will merge into develop before everything that will be developed is finished (https://discourse.mc-stan.org/t/monthly-math-development-meeting-06-25-2020-10-am-edt/16012/23).
It might be helpful to lay out what these pieces look like (what currently exists and what is likely to come in the next 3 months).
We'll basically be rewriting all of reverse mode to support this. Hopefully that can all go through adj_jac_apply.
It's also probably worth discussing return types. I'm still a fan of f(A, B, C) -> D
, D is a var_value<Eigen>
type if any of A, B, or C are Eigen<var>
or var_value<Eigen>
types.
Should also talk about the status of the LTO Windows/g++/clang++/Boost whatever it is
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's also probably worth discussing return types. I'm still a fan of
f(A, B, C) -> D
,D
is avar_value<Eigen>
type if any of A, B, or C areEigen<var>
orvar_value<Eigen>
types.
Right now I have
Functions which can take in multiple matrices will only return a constant matrix if all of the other matrix inputs as well as the return type are static matrices.
I'd rather be conservative on this for now aka having constant matrices stricly happen if all input types and the output type are going to be a constant. We can parse through a program to understand whether that's true or not. Once we run more performance tests to see if the copy of static->dynamic is expensive or not we can relax this
# Drawbacks | ||
[drawbacks]: #drawbacks | ||
|
||
More templates can be confusing and will lead to longer compile times. The confusion of templates can be mitigated by adding additional documentation and guides for the Stan Math type system. Larger compile times can't be avoided with this implementation, though other forthcoming proposals can allow us to monitor increases in compilation times. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think this is a substantial drawback. Such is life with templates.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it's fine to list it tho. I'm going to include the flto thing here as well
|
||
More templates can be confusing and will lead to longer compile times. The confusion of templates can be mitigated by adding additional documentation and guides for the Stan Math type system. Larger compile times can't be avoided with this implementation, though other forthcoming proposals can allow us to monitor increases in compilation times. | ||
|
||
Waiting for type attributes in the Stan language means this feature will be a compiler optimization until both type attributes are allowed in the language and an agreement is made on stan language naming semantics. This is both a drawback and feature. While delaying a `constant_matrix` type as a user facing feature, it avoids the combinatorial problem that comes with additional type names for matrices proposed in the language such as `sparse` and `complex`. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Deferring this one.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it's fine to leave this as is
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah I think I read it wrong.
More templates can be confusing and will lead to longer compile times. The confusion of templates can be mitigated by adding additional documentation and guides for the Stan Math type system. Larger compile times can't be avoided with this implementation, though other forthcoming proposals can allow us to monitor increases in compilation times. | ||
|
||
Waiting for type attributes in the Stan language means this feature will be a compiler optimization until both type attributes are allowed in the language and an agreement is made on stan language naming semantics. This is both a drawback and feature. While delaying a `constant_matrix` type as a user facing feature, it avoids the combinatorial problem that comes with additional type names for matrices proposed in the language such as `sparse` and `complex`. | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A drawback that cost a lot of time was the LTO thing.
You should write down the LTO problem and a quick summary of all the things you did you try to work around this issue.
I remember you tried at least:
boost::variants
instead of polymorphism- A different stack for every vari_base subclass
- Don't use the chaining stack for zeroing. Instead of having a chaining/non-chaining stack have a chaining/zeroing stack. Everything goes in the zeroing stack. Only chaining things go in the chaining stack.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I updated this section to list out the problem and the attempted solutions
This is currently a Work In Progess design document for adding
static_matrix
andstatic_vector
types to the Stan language.You can see a full rendered version here. See the section on Prior art for references to past discussions on this.
@bob-carpenter and @t4c1 can you take a look at this? Feel free to make changes as you see appropriate