Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Static matrices #21

Closed
wants to merge 7 commits into from
Closed

Conversation

SteveBronder
Copy link
Collaborator

@SteveBronder SteveBronder commented May 1, 2020

This is currently a Work In Progess design document for adding static_matrix and static_vector types to the Stan language.

You can see a full rendered version here. See the section on Prior art for references to past discussions on this.

@bob-carpenter and @t4c1 can you take a look at this? Feel free to make changes as you see appropriate


```stan
static_matrix[N, M] A = // Fill in A...
A[10, 10] = 5;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[optional] You can improve clarity by adding a comment that this operation is illegal.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The "subsets of the matrix" is confusing. There are blocks, elements, etc., but "subset" isn't well defined here. I think what you mean is that we'll be able to assign whole matrices, but won't be able to have any indexing or slicing/blocking on the LHS of an assignment. I'm not sure how best to say that concisely.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"Fill in A" is confusing. We can have static_matrix[N, M] A = expression wherever expression is of type static_matrix or of type matrix. That is, we should allow the assignment of a regular matrix to a static matrix and vice-versa.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"Fill in A" is confusing

How about "Construct A"?

Copy link
Collaborator Author

@SteveBronder SteveBronder May 7, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The "subsets of the matrix" is confusing. There are blocks, elements, etc., but "subset" isn't well defined here.

I can't think of working here that's better than subset. A subset of a matrix is just a slice, block. element, etc.

I think what you mean is that we'll be able to assign whole matrices, but won't be able to have any indexing or slicing/blocking on the LHS of an assignment. I'm not sure how best to say that concisely.

I'd actually rather not allow assignment of whole matrices after first declaration specifically because of the naming issues we've had on this design doc. imo I'd rather keep it very simple and then we can be fine calling it const matrix or fixed_matrix. Saying, "This is constant except when you assign to the entire matrix" also feels a bit odd. It also fixes the above definition to

At the language level, a const matrix can replace a matrix type if the matrix is only constructed once and never reassigned to.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's what is great about 'whole matrix' and 'whole vector' as types. The error message is, 'you can only assign the whole matrix/whole vector!'.

I think submatrix is the thing. I guess a subset of a matrix could be element (1, 1), element (7, 5) and any values < 3. There's a definition on Wolfram so good enough!

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm OK with "submatrix" but not with "subset" because these objects are not subsets.

Saying, "This is constant except when you assign to the entire matrix" also feels a bit odd.

The C distinction between a constant pointer and a pointer to a constant has been confusing programmers for years and I can see how trying to make this kind of subtle distinction about const would be confusing. That's why I don't want to call it "const"!


Besides that they can be thought of as standard matrix.

At the Stan Math level a `static_matrix` is a `var_type<Eigen::Matrix<double, -1, -1>>` with an underlying pointer to a `vari_type<Eigen::Matrix<double, -1, -1>>`*. This means that accessors for the value and adjoint of `var_type` objects can access contiguous chunks of each part of the `var_type`. Any function that accept a `var_type<T>` will support static matrices.
Copy link
Contributor

@t4c1 t4c1 May 1, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[optional] I would prefere var_template and vari_template to var_type and vari_type. All these are types anyway. However they differ from var and vari in the fact that they are templates.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have the same issue with template---everything's a template. I'm not sure what to call this, though. You probably need the design for var_whatevs in here too.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Naming things, I'm gonna rattle off a whole bunch here and see what sticks

  • var_expr
  • ad_type
  • ad_pack
  • var_value
  • ad_value
  • var_t

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

everything's a template

Not really. var and vari as they are implemented in develop are not templates. Even with the proposed changes they would not be templates - they would be template instatinations. So I still think var_template is the best way to differentiate that from existing var.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What about just var_t?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We already use _t for metaprograms returning a type ...

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Even with the proposed changes they would not be templates

Is there any downside to just making them templates?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

With `sparse_matrix` and `complex_matrix` this will now add _another_ `*_matrix` type proposal to the language. There's no way to get around this in the Stan language currently, though some small side discussions exist to support attributes on types such as

```stan
@(sparse_type, static_type) // can't have static here because of C++ keyword
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

// can't have static here because of C++ keyword

Can you explain why not?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That won't get in the way of our code---we just don't want conflicting things code generated.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh my b I thought any C++ reserved keywords are banned. Not sure how we currently look for them though it may be good to avoid them so that the compiler doesn't have to check for "No C++ reserved keywords except for in these specific places"

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I like const more than static. const just implies you shouldn't be assigning. static is more like defining the scope of a variable (in Stan, it seems like static data would be the stuff that lives for the whole length of a chain, not just a gradient evaluation).

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh whoops I see this has already been said my bad.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  1. We already use for and if and int and while and return and continue. It's really just identifiers. That's the only place users can define variables and we don't want users defining variables that overlap with C++ or Stan keywords.

  2. I agree that const means you shouldn't be assigning. But that's not what's going on here. We can assign, just not to individual elements.

    static_matrix a = b * c;
    a = a * d;  // OK to reassign
    a[1, 2] = 3;  // ILLEGAL to assign to elements
    

    Reassigning still maintains memory locality. Assigning to elements does not.

# Summary
[summary]: #summary

This proposes a `static_matrix` and `static_vector` type in the Stan language which on the backend will allow for significant performance opportunities. In Stan Math this will be represented by `var<Eigen::Matrix<double, 1, 1>>` or `var<matrix_cl<double>>`. The implementation will be fully backwards compatible with current Stan code. With optimizations from the compiler the conditions for a `matrix` to be a `static_matrix` can be detected and applied to Stan programs automatically for users.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So this is also extending stan language? That was not previously clear form the issue. In that case I would propoese to add const keyword instead. That would be consistent with C++.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think they're quite const, because you could reassign a static_matrix (or static_vector) variable, as long as the thing being assignedis itself a static matrix (or vector).

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What about static_row_vector?

Also, this is going to get confusing when we have complex_matrix, because then we'll have matrix, static_matrix, complex_matrix, and static_complex_matrix. So I'm also wondering like @t4c1 if there's some way we can make this more combinatorial at the syntax level. Maybe something like static matrix . Then we could apply the same concept to arrays.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So this is also extending stan language? That was not previously clear form the issue. In that case I would propoese to add const keyword instead. That would be consistent with C++.

I'm half and half on extending the language for this. I wouldn't mind having this as something that strictly the compiler would look for and automate

What about static_row_vector?

Yes I cover this below. Maybe something like

fixed complex matrix[N, M] A = // var<Eigen::Matrix<complex<double>, -1, -1>>

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think they're quite const, because you could reassign a static_matrix (or static_vector) variable, as long as the thing being assigned is itself a static matrix (or vector).

How would you feel about making them const tho? We could change the definition I wrote above so that you can't assign after the first declaration.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think that would be just an arbitrary restriction. In other words no reason to do it.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In other words no reason to do it

Well there is a reason to do it -- it makes the naming simple :P.

But I think this breaks with arrays.

const matrix[N, M] matrices[P];

Is there an autodiff-themed name for this? We're really doing it for autodiff purposes.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I like fixed_matrix, but that implies a new type rather than a new annotation. I think that'll be simpler in the end, even if it may seem like more combinatorics to us (I say "seem" because I'm not sure it will be more combinatorics in the implementation given that "const" is very peculiar in meaning here).

Copy link
Collaborator Author

@SteveBronder SteveBronder May 4, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm, maybe there needs to be an annotation/attribute design doc before we decide on this piece. We can use that for complex and sparse. That would probably be a big enough language feature to do the official stanc3

# Motivation
[motivation]: #motivation

Currently, a differentiable matrix in Stan is represented as an Eigen matrix holding a pointer to an underlying array of autodiff objects, each of those holding a pointer to the underlying autodiff object implementation. These `N*M` autodiff elements are expensive but necessary so the elements of a matrix can be assigned to without copying the entire matrix. However, in instances where the matrix is treated as a whole object such that underlying subsets of the matrix are never assigned to, we can store one autodiff object representing the entire matrix.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Before we finalize the static matrix design doc I think we should discuss a bit the part of the optimizations that also apply to non-static matrix. Namely what I suggested here: stan-dev/math#1805 (comment)

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you mean

I think the first suggestion could be simplified a bit by writing a specialization for Eigen::Matrix. Also we can make non-static version with second member Eigen::Matrix<vari*, -1, -1> adj_;. This is not as efficient as what you suggested, but can be modified.

I can dig the route of a specialization for Eigen::Matrix<var>. Though what I'm confused about is, if someone wants to dynamically assign parts of a matrix why wouldn't they just use the current matrix type?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@t4c1 if we add Eigen::Matrix<var> what would the internal Eigen representation be? And would we have to write specializations for other Eigen types? I kind of like the var_t<T> approach since that would also work for sparse matrices (and probably with some other stuff matrix_cl<>`)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I can dig the route of a specialization for Eigen::Matrix

Sorry, that was not the part I wanted to point out here.

if someone wants to dynamically assign parts of a matrix why wouldn't they just use the current matrix type?

I think we may be able to remake it in a way that would be as efficient as static matrix you are proposing here.

if we add Eigen::Matrix what would the internal Eigen representation be?

It would not actually be Eigen type. I was just thinking we could specialize it this way and still implement it the same way as you propose for var<MatrixXd> to allow us to keep most functions accepting it the same. Am not sure anymore if that was such a great idea.

Also we can make non-static version with second member Eigen::Matrix<vari*, -1, -1> adj_

That was the part I actually wanted to point out. Although after some more thinking I am not so sure whether this is the best way.

Maybe we could instead implement changing a part of what you are suggesting as a static matrix to have it completely replace Matrix<var>? I think we would have to change some bits of how chain stack works ...

Just some brainstorming. I need to think about this some more.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we may be able to remake it in a way that would be as efficient as static matrix you are proposing here

Oh like replacing the definition of matrix[N, M] in the language? imo I think I'd rather leave that to another proposal. Though if it's something simple then I'd be for adding it here. I was kind of confused by the dynamic case you and Bob were talking about. Would we be able to do something simple like a specialization of var_type?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh like replacing the definition of matrix[N, M] in the language?

No. Just replacing Eigen::Matrix<var> in Stan Math. Language needs not to change.

imo I think I'd rather leave that to another proposal.

Sure. I was just thinking that if we can do that there is no need for changes in language in this design doc.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@t4c1 --- I didn't undestand the Matrix<vari*, -1, -1> plan. That's equivalent to Matrix<var, -1, -1> in terms of memory and how it executes (var has no virtual methods and a single member vari*, so it's memory-equivalent to vari*).

There's no way to completely replace Matrix<var, -1, -1> with fixed matrices without giving up the ability to assign (that's what I believe JAX and PyTorch do---give up on matrix assignment). As soon as you assign to an element, you need a much more flexible data structure that loses memory locality. Now we do want to be able to assign with something like a comprehension or function (say for a covariance matrix for a GP).

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I didn't undestand the Matrix<vari*, -1, -1> plan.

It was just an idea. Not the best one for exactley the reasons you are pointing out.

There's no way to completely replace Matrix<var, -1, -1> with fixed matrices without giving up the ability to assign

Maybe we don't have to give it up. I haven't made any prototypes, so I'm not 100% sure, but I think we may be able to implement assignment for the proposed change to the matrix. It would be more complicated and slower than the assignment currently is - but we are getting speedups elswhere anyway. So if it it turns out it is possible, there is no reason for any changes to the language - we just replace the matrix in Stan Math.

@rok-cesnovar
Copy link
Member

Need to digest this a bit more, but a quick question for now. Is it necessary to add a new Stan language type or could this be just a stanc optimization step? I am not opposing this, just wondering.

That would simplify the implementation I guess?

Copy link
Collaborator

@bob-carpenter bob-carpenter left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for writing this up. Here are some initial thoughts.

# Summary
[summary]: #summary

This proposes a `static_matrix` and `static_vector` type in the Stan language which on the backend will allow for significant performance opportunities. In Stan Math this will be represented by `var<Eigen::Matrix<double, 1, 1>>` or `var<matrix_cl<double>>`. The implementation will be fully backwards compatible with current Stan code. With optimizations from the compiler the conditions for a `matrix` to be a `static_matrix` can be detected and applied to Stan programs automatically for users.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think they're quite const, because you could reassign a static_matrix (or static_vector) variable, as long as the thing being assignedis itself a static matrix (or vector).

# Summary
[summary]: #summary

This proposes a `static_matrix` and `static_vector` type in the Stan language which on the backend will allow for significant performance opportunities. In Stan Math this will be represented by `var<Eigen::Matrix<double, 1, 1>>` or `var<matrix_cl<double>>`. The implementation will be fully backwards compatible with current Stan code. With optimizations from the compiler the conditions for a `matrix` to be a `static_matrix` can be detected and applied to Stan programs automatically for users.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What about static_row_vector?

Also, this is going to get confusing when we have complex_matrix, because then we'll have matrix, static_matrix, complex_matrix, and static_complex_matrix. So I'm also wondering like @t4c1 if there's some way we can make this more combinatorial at the syntax level. Maybe something like static matrix . Then we could apply the same concept to arrays.


```stan
static_matrix[N, M] A = // Fill in A...
A[10, 10] = 5;
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The "subsets of the matrix" is confusing. There are blocks, elements, etc., but "subset" isn't well defined here. I think what you mean is that we'll be able to assign whole matrices, but won't be able to have any indexing or slicing/blocking on the LHS of an assignment. I'm not sure how best to say that concisely.


```stan
static_matrix[N, M] A = // Fill in A...
A[10, 10] = 5;
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"Fill in A" is confusing. We can have static_matrix[N, M] A = expression wherever expression is of type static_matrix or of type matrix. That is, we should allow the assignment of a regular matrix to a static matrix and vice-versa.

A[10, 10] = 5;
```

Besides that they can be thought of as standard matrix.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What does this mean? In a spec, you want to be more concrete. I think the notion we want is assignability. If A is assignable to B, then we want to make sure that anywhere B can be used, A can also be used.

With `sparse_matrix` and `complex_matrix` this will now add _another_ `*_matrix` type proposal to the language. There's no way to get around this in the Stan language currently, though some small side discussions exist to support attributes on types such as

```stan
@(sparse_type, static_type) // can't have static here because of C++ keyword
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That won't get in the way of our code---we just don't want conflicting things code generated.

# Rationale and alternatives
[rationale-and-alternatives]: #rationale-and-alternatives

Besides the tech burden I'm rather happy with this implementation, though I'm open to any criticisms/alternatives and will add them here. You can see some of the discussion on this as well in issue [#1805](https://github.com/stan-dev/math/issues/1805) and the prior art below.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You don't need to include your personal feelings about it. Presumably you're OK with this if you're proposing it.

If there's relevant discussion, include it here in summary form. Nobody's going to click through issues ot figure out what you mean here.


- What parts of the design do you expect to resolve through the RFC process before this gets merged?

I'm interested in hearing about Stan language semantics and any difficulties I may have missed during the implementation section.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should all be taken out of the first person. I don't know why the guidelines are written that way!

It goes without saying tha tyou want to hear about any issues, so you can drop this. If you don't have something specific to say in a spec, don't say it.


- What related issues do you consider out of scope for this RFC that could be addressed in the future independently of the solution that comes out of this RFC?

Any intricacies of using the GPU via `var_type<matrix_cl<double>>` should be deferred to a separate discussions or can happen during the implementation.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a great example of a clear and precise issue that's written in third person!

- [A New Continuation Based Autodiff By Refactoring](https://discourse.mc-stan.org/t/a-new-continuation-based-autodiff-by-refactoring/5037/2)

[Enoki](https://github.com/mitsuba-renderer/enoki) is a very nice C++17 library for automatic differentiation which under the hood can transform their autodiff type from an arrays of structs to structs of arrays. It's pretty neat! Though implementing something like their methods would require some very large changes to the way we handle automatic differentiation.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

JAX is the major system that's built like this, though I think PyTorch may also use a static design. As far as I know, they only have static matrices.

Thanks for the pointer to Enoki. I don't know that one (though I know the mushroom).

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good call I'll link to these

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@bob-carpenter can you point me to any docs from jax / pytorch that talk about the sort of static matrix type they have? It looks like in pytorch they mainly use tensors and allow for subsetting

https://pytorch.org/docs/stable/tensor_view.html

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For JAX:

https://jax.readthedocs.io/en/latest/notebooks/Common_Gotchas_in_JAX.html#🔪-In-Place-Updates

I love that they literally put a knife in the URL for the "sharp bits" of the doc.

There's no reason not to use a slice, block, or element of a matrix as an rvalue---the problems start when you assign to a matrix element. That's why it's OK to assign whole matrices.

Not allowing assignment of whole matrices means that transformed data, transformed parameters, and generated quantities need to be declare-defined in one line, which means they can't depend on any other calculations in the blocks they are declared in. For example, the following code pattern becomes impossible.

transformed parameters {
  static_matrix a[3, 2] x; 
  T1 b;
  T2 c;
  ... some calculations involving b and c ...
  x = ... some function of those calculations
}

Instead, b and c need to be declare-defined before x and a function defined to call it,

  matrix b = foo();
  matrix c = bar(b);
  matrix x = f(b, c);

and often it means we can't share computations that we'd like to share for calculating c or x. At least that's been the problem in the past with defining sizes, which have this constant character.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added JAX doc to the prior art!

We are pushing the language discussion out of this design for now but seeing this I'm get this now

@SteveBronder
Copy link
Collaborator Author

Need to digest this a bit more, but a quick question for now. Is it necessary to add a new Stan language type or could this be just a stanc optimization step? I am not opposing this, just wondering.

That would simplify the implementation I guess?

Yes we spoke about that in the issue. Personally I'm fine with this just being a compiler optimization. Though Bob suggested allowing users to declare them and I'm also fine with that. Though yes it adds complexity to the spec

@wds15
Copy link
Contributor

wds15 commented May 1, 2020

This all looks really cool...my only comment would be to really make sure with prototypes that this is worth the effort which appears to be huge.

(But I would think that this is on the right track)

One Q: Will the AD tape change or will that still be the same and we just put different stuff there?

@SteveBronder
Copy link
Collaborator Author

This all looks really cool...my only comment would be to really make sure with prototypes that this is worth the effort which appears to be huge.

Yeah I think we can make an easy prototype just by doing steps (1) and (2), then just taking a stan program that does matrix stuff now and changing stuff to use the static matrix version.

There's def work here, but with the backwards compatibility part it can be done pretty iteratively

One Q: Will the AD tape change or will that still be the same and we just put different stuff there?

Since that vari* will still just hold members as normal idt there should be anything significant to note. The stack allocator would need to allocate enough for a vari<T>

@bob-carpenter
Copy link
Collaborator

bob-carpenter commented May 2, 2020

The stack is definitely going to need to change. Right now, it assumes vari objects with values and adjoints. There's no way to build a single stack that'll hold either vari or vari because that's two different types. The current target for specialiing operator new is the vari structure. This can maybe be generalizable another level with something like a chainable object that only has a chain() method. But we can only hold pointers/references to those if we don't want to slice (that's what we do now).

There's no doubt there's a huge speed improvement to be had from dealing with large matrices in matrix form and not having to walk over them to put them together and take them apart. I agree it'd be nice to know how big that speedup is. But that'd be easy to profile. For instance, in autodiff now, we can run Stan, and for the new thing, it's just going to be some matrix multiplies of double matrices.

Edit: Specifically this:

using ChainableStack = AutodiffStackSingleton<vari, chainable_alloc>;

We need a single type there, not a template or we'll have multiple stacks. So we need to go up a level in abstraction to something like chainable instead of vari to make this work.

@SteveBronder
Copy link
Collaborator Author

@bob-carpenter @wds15 What if we had all vari_type<T> derive from a vari_base and then changed ChainableStack to

using ChainableStack = AutodiffStackSingleton<vari_base, chainable_alloc>;

Then it would just do the polymorphic thing we currently do

https://godbolt.org/z/GYzYwx

@bbbales2
Copy link
Member

bbbales2 commented May 4, 2020

Then it would just do the polymorphic thing we currently do

Hmm, there's already one layer of polymorphism here.

Things on the stacks already inherit from vari (https://github.com/stan-dev/math/blob/develop/stan/math/rev/core/dv_vari.hpp).

So there would be two levels of indirection there.

If the things that inherit from vari also become responsible for:

  1. Storing data & adjoints
  2. And implementing set_zero_adjoint()

Does that also work?

Now I'm just confused thinking about how I'd write an operator that accepts a matrix of vars. I'd have to write that along with the version that accepts the var matrix, and the code would be different.


A side comment, I'm not sure values and adjoints should be the same type. It seems entirely plausible we'd use float calculations in the process of computing a double result, and I think we'd want to keep the double precision in the adjoint as we propagate it back through.

@betanalpha
Copy link

betanalpha commented May 4, 2020 via email

@bob-carpenter
Copy link
Collaborator

The answer to @bbbales2 last question is @SteveBronder's last response :-)

Now I'm just confused thinking about how I'd write an operator that accepts a matrix of vars. I'd have to write that along with the version that accepts the var matrix, and the code would be different.

That's right.

I'm not sure values and adjoints should be the same type. It seems entirely plausible we'd use float calculations in the process of computing a double result, and I think we'd want to keep the double precision in the adjoint as we propagate it back through.

Why? The place we get into trouble with float is with big matrix operations like Cholesky factorization and lots of involved arithmetic like in ODE solvers. Why do you think gradients need more accuracy?

I’ve been uncomfortable with the names being thrown around because none
of them capture this defining behavior.

const and static are both wrong and fixed shares problems with const. The word encapsulated is correct, but ambiguous in a CS setting with something that just means it hides implementation details. How about whole (for "holistic")?

@SteveBronder
Copy link
Collaborator Author

Hmm, there's already one layer of polymorphism here.

Things on the stacks already inherit from vari (https://github.com/stan-dev/math/blob/develop/stan/math/rev/core/dv_vari.hpp).

I think there's some confusion, I fixed up this goldbolt example to be closer to what I'm talking about. There will still only be one level of polymorphism

https://godbolt.org/z/nR62EU

A side comment, I'm not sure values and adjoints should be the same type. It seems entirely plausible we'd use float calculations in the process of computing a double result, and I think we'd want to keep the double precision in the adjoint as we propagate it back through.

I'd like to talk about this more but it should be not on this design doc. In short that's a thing we can test when we get the base implementation of this up and running

I’ve been uncomfortable with the names being thrown around because none
of them capture this defining behavior. “Monolithic” is often used to describe
the whole rather than the elements, but “encapsulated” is more closer to the
element -> whole construction. I like “encapsulated” but it is a big ungainly,
although “encap” is a pretty unambiguous shortening.

idk about encap or mono. I'd like a name users can see and be like, "I have a general idea of what this is about". (also on this same note static doesn't satisfy this since it means something very different in C++!)

Ugh, naming things. I would be so fine with having a silent auction where users put money on what they want the name to be and we end up with a numbery_mcnumberface type attribute. Personally I like the idea of just stopping users from doing

transformed parameters {
const matrix[N, M] A = B * C;
A = B * transpose(B); // NOPE!
}

@bob-carpenter that second assignment of A still causes a full new matrix to be put on the autodiff stack right? If so then imo I think that's actually deceptive to users as they'll think they are reusing that A when it actually does trip a new copy

@bob-carpenter
Copy link
Collaborator

I think that's actually deceptive to users as they'll think they are reusing that A when it actually does trip a new copy

Right now in Stan, when we do

matrix A = B;
A = C * C';

we get a new matrix allocated for C' and for C * C'. We don't have a lot of compilation/execution details in the doc, but it wasn't intended to be deceptive! Some more words in the optimization chapter around copying would be helpful.

But what I was saying is that we could save the rows and columns spec, then we can re-use the LHS memory as you are suggesting.

Attributes of the Java variety (things decorated with @(...) are going to add a layer of syntactic complexity to the language our users may not be used to. The furthest we have gotten is a single type qualifier for data.

@bbbales2
Copy link
Member

bbbales2 commented May 4, 2020

There will still only be one level of polymorphism

Oh yeah that looks fine. So now base vari does absolutely nothing other than polymorph.

I'd like to talk about this more but it should be not on this design doc. In short that's a thing we can test when we get the base implementation of this up and running

Makes sense. This is useful regardless of whether we try to do floats at some point.

@bob-carpenter
Copy link
Collaborator

base vari does absolutely nothing other than polymorph.

base_vari needs a chain() method if we're going to use vector<base_vari*> for the autodiff stack. My point was just that this has to be a superclass of vari, because vari include a double value and adjoint which general versions of this won't have. I think chainable is the right level now if that's still there---I was lobbying to get rid of it because it wasn't being used anywhere!

@SteveBronder
Copy link
Collaborator Author

I got a version of this up for addition and multiplication today that passes all the tests for var, works for var_type<Eigen::Matrix<double, -1, -1>>, and has the static and dynamic version gradient calculations matching up for a simple example. I think it could work for var_type<matrix_cl<double>> if we add += operator and some other functions for matrix_cl types that we don't have rn

https://github.com/stan-dev/math/compare/feature/var-template

git checkout feature/var-template
./runTests.py ./test/unit/math/rev/core/static_matrix_checking_test.cpp

There's a couple weirds I'm doing here and there since this is an mvp. But it seems to work with the stack allocator nicely. Shout out to @bbbales2 for the help!

@bob-carpenter
Copy link
Collaborator

bob-carpenter commented May 7, 2020 via email

@SteveBronder
Copy link
Collaborator Author

Yes just finished and they are pretty cool! The benchmarks here are for static (var_type<Eigen::Matrix<double>>) vs dynamic (Eigen::Matrix<var>) matrices for a sum of adding two matrices, multiplying two matrices, and a multiply and add of the two matrices. I did a sum just because it was easier to take grad. The benchmark is for calling .grad() after the forward pass is all setup. I think that’s reasonable since I wanted to check how much speedup we get from reducing the size of the call stack. I also checked each of these to make sure the static version adjoints match with the dynamic version

Trying to parse out why it's so much faster I have a few ideas

  1. Reduced number of .chain() calls
  2. Better memory locality
  3. Eigen with custom types doesn’t give us vectorized instructions, but the stuff in static does since eod it’s all Eigen matrices of doubles getting pumped around.

image

image

image

The only bad side I need to add here is that for each of the matrix operators like multiply we need all their matrix/vector/scalar derivatives which I forgot about and will add more work here. Though I'd argue with the speedups above the pain of that is very worth the gain!

@bbbales2
Copy link
Member

bbbales2 commented May 7, 2020

This is awesome! This looks like it is finally the solution that will speed up the simple vectorized operations that we haven't managed to get before!

What's the next step in getting this finished up? Do we need to finalize the Stan interface or do you want comments on the current C++?

we need all their matrix/vector/scalar derivatives

I don't think anything changes here compared to the old code, at least for reverse mode. What is the source of this worry?

@SteveBronder
Copy link
Collaborator Author

This is awesome! This looks like it is finally the solution that will speed up the simple vectorized operations that we haven't managed to get before!

Ty! Yeah this made me very excited. Once we add this no one is ever going to benchmark against us with a logistic regression lol

What's the next step in getting this finished up? Do we need to finalize the Stan interface or do you want comments on the current C++?

I think we need to finish the stan interface and a quick review of the basic implementation would be good.

we need all their matrix/vector/scalar derivatives

I don't think anything changes here compared to the old code, at least for reverse mode. What is the source of this worry?

I was getting the wrong answers when just applying the old code. I had to go to matrixcalculus.org and add the transpose() around the matrix version. Though if I'm wrong that would be cool!

@bbbales2
Copy link
Member

bbbales2 commented May 7, 2020

Once we add this no one is ever going to benchmark against us with a logistic regression lol

Lol, well, it's a compliment to be the reference.

I think we need to finish the stan interface and a quick review of the basic implementation would be good.

I'll look over the C++ later tonight.

Re-interface, I liked the 'whole' naming scheme Bob proposed. I wonder what the implications of a decorator would be though (if that would be somehow easier to work with in Stan code).

I was getting the wrong answers when just applying the old code. I had to go to matrixcalculus.org and add the transpose() around the matrix version

Oooh, okay, well whatever we write goes through the autodiff tester, so I don't see any reason to worry. We'll catch these things.

@bob-carpenter
Copy link
Collaborator

bob-carpenter commented May 7, 2020

I think the plot's right, but the x-axis labels are wrong---those can't be log N because they go up to 1000. In general, I prefer the way you did it---log scale axes, plain scale labels.

I would guess the decreasing gains from larger N with mulitply it's because the copy is always O(N^2) whereas matrix multiply is O(N^3), so when N gets big, the O(N^3) starts dominating. Matrix addition is still O(N^2), which is presumably why you see much bigger gains there.

GLMs involve a double-based matrix multiplied by a vector of autodiff variables, so that's an O(N) copy operation for the vector and O(N^2) multiplication.

@SteveBronder
Copy link
Collaborator Author

I think the plot's right, but the x-axis labels are wrong---those can't be log N because they go up to 1000. In general, I prefer the way you did it---log scale axes, plain scale labels.

Yes apologies it's supposed to be N not log N. It's just spaced out on a log10 scale.

GLMs involve a double-based matrix multiplied by a vector of autodiff variables, so I'm guessing that'll look more like addition than multiplication because matrix-vector multiply is only O(N^2).

Nice!

@bbbales2
Copy link
Member

bbbales2 commented May 7, 2020

GLMs involve a double-based matrix multiplied by a vector of autodiff variables, so that's an O(N) copy operation for the vector and O(N^2) multiplication.

Yes I think we got lapped on how to speed up GLMs in Stan/handle the memory bottleneck stuff lol. I'd basically given up writing custom autodiff for these things -- it seemed like the virtual function calls just weren't that expensive. Apparently the pointer chasing was the baddie!

@bob-carpenter
Copy link
Collaborator

I'd basically given up writing custom autodiff for these things

Memory is a second reason to write custom autodiff for matrices. Just throwing matrices into scalar autodiff is a real memory hog.

@SteveBronder
Copy link
Collaborator Author

Word! I'll update this design doc with the comments from the discussion today and we can put it up for review.

@bob-carpenter
Copy link
Collaborator

Yes I think we got lapped on how to speed up GLMs in Stan/handle the memory bottleneck stuff

The optimizations for the GLM functions could also be applied with a static matrix type. It's win-win.

@SteveBronder
Copy link
Collaborator Author

How does memory play into this? The Map approach uses memory from our arena, but a Matrix would allocate on the general heap and then hang onto that memory until freed at the end of autodiff. And yes, that means we hit the general heap even if we use Map whenever there's a copy.

Is this all in reference to the Eigen::Map inside of vari_value? tbh I'm not sure how we could having an Eigen matrix in there while still keeping the data on our stack. I'm not really sure how we can go about using rvalue semantics with the condition that data stays on our stack

@bob-carpenter
Copy link
Collaborator

There's a third stack for storing objects whose constructors need to be called after autodiff. That's where Matrix objects go. We got rid of almost all of those other than some solver objects.

@syclik
Copy link
Member

syclik commented May 13, 2020

@SteveBronder reached out and asked me to comment.

My initial thought (at the version that's up now) is that it looks great and sensible. For the Math library, I think it'll come down to the details. This deep down in the autodiff stack it's ok to be complicated, but it does need to be clear and maintainable by the team. I think this proposal falls into that category.

The one thing I'd suggest before getting too deep: check the timing of non-matrix operations. In the benchmarks posted, there wasn't a comparison of how things behaved outside of this change. It's something I'd do sooner rather than later, just to make sure we're not pursuing something that isn't viable. If this change is equal in performance or faster than the existing autodiff implementation, then it's easy to justify change. If it's a lot slower, then it's time to look for a different implementation. If it's a bit slower, then it's a hard decision where we have to weigh the tradeoff. (I'm intentionally being vague here. We don't have hard and fast rules about how much slower we're willing to get as we make improvements. Ideally we're always in the faster branch, but that isn't always realistic.)

@t4c1
Copy link
Contributor

t4c1 commented May 13, 2020

tbh I'm not sure how we could having an Eigen matrix in there while still keeping the data on our stack.

We can't. Probably.

Why would we need to? Unless the matrices are small, memory locality is not a problem. The only downside is the need to call those destructors.

@bob-carpenter
Copy link
Collaborator

Unless the matrices are small, memory locality is not a problem.

Right. But I think almost all of the non-data matrices we'll use will be small. At least outside of a GP context.

@bbbales2
Copy link
Member

The reason being that if we construct a var from a rvalue matrix we can just move it.

I guess back to this, why can't we move it? If we have a move constructor for our vari_type<MatrixXd> objects, can't they just steal the pointers from the rvalue?

@t4c1
Copy link
Contributor

t4c1 commented May 13, 2020

But I think almost all of the non-data matrices we'll use will be small

I think it would only be an issue if their size was comparable or smaller to cache line size. I believe most matrices in stan models will be significantly larger than that.

I guess back to this, why can't we move it? If we have a move constructor for our vari_type objects, can't they just steal the pointers from the rvalue?

In this case we could. I was, however, thinking about a different scenario - constructing a vai_type<MatrixXd> from two MatrixXds. In this case we can not steal the pointers if we are using maps in vari_type (unless we replace every use of MatrixXd with a Map, which is a bit of an overkill in my opinion).

@bob-carpenter
Copy link
Collaborator

I think it would only be an issue if their size was comparable or smaller to cache line size.

They're certainly not usually that small---that'd only be 2 x 3 max!

I don't think replacing everything with a Map is a good idea. It's low overhead, but there are still constructors everywhere. And it's confusing.

@SteveBronder
Copy link
Collaborator Author

Okay it looks like I need to change my impl to use the object stack that gets deleted.

ftr I think we are all good with the concept of static matrices here but I want to wait to ask for approval till we have a base implementation done and tested a bit

@bob-carpenter
Copy link
Collaborator

I need to change my impl to use the object stack that gets deleted.

It doesn't get deleted, it just calls the destructors. For an RAII object like an Eigen::Matrix or std::vector, that amounts to calling the destructors on the contained memory.

@SteveBronder
Copy link
Collaborator Author

It doesn't get deleted, it just calls the destructors. For an RAII object like an Eigen::Matrix or std::vector, that amounts to calling the destructors on the contained memory.

Whups yes that's what I meant

@rybern
Copy link

rybern commented May 18, 2020

@rybern @rok-cesnovar can you check that section out and lmk if I'm blowing smoke?

The rules written out should be easy to implement in stanc3. The "algorithm" is the same as with matrix_cl except that the rules on when something is a "static matrix" are different.

+1, the optimization you describe should be easy once the math signatures are in place.
(Sorry I haven't been around!)

@SteveBronder
Copy link
Collaborator Author

Is this okay to merge? I think we have everything sorted out

@bbbales2
Copy link
Member

bbbales2 commented Jul 2, 2020

The doc is still written in terms of const matrices. Correct me if I'm wrong but we axed the const idea already.

Are there even going to be any language level indicators on these variables at this point? Or is it all happening magically under the hood?

I'd also really like the name to change from static to something else. static in C++ is scope, and I'd like to keep that meaning here. If there's nothing attached to the language, matrix autodiff types makes sense to me (as compared to matrices of autodiff types).

@SteveBronder
Copy link
Collaborator Author

The doc is still written in terms of const matrices. Correct me if I'm wrong but we axed the const idea already.

The doc is just for the math library so we are not finalizing anything in the language here (that's going to be separate design doc)

Are there even going to be any language level indicators on these variables at this point? Or is it all happening magically under the hood?

Until we sort out what to name these they should just be a compiler thing

@SteveBronder
Copy link
Collaborator Author

I'm not attached to any name at all ftr

@bbbales2
Copy link
Member

bbbales2 commented Jul 2, 2020

Okay makes sense. I can give this a review tomorrow. I think you're right that the picture in math is complete enough we should wrap this up.

@SteveBronder
Copy link
Collaborator Author

Ty! Before you review let me read this over once more tmrw to make sure everything is lined up with how we are doing stuff in math and I'll ping you when it's ready for another look

@SteveBronder
Copy link
Collaborator Author

@bbbales2 Just updated if you have a minute feel free to look over!

@bbbales2
Copy link
Member

bbbales2 commented Jul 7, 2020

Nice! Will do.

Copy link
Member

@bbbales2 bbbales2 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This needs updated. It is quite behind what we're working on.

# Summary
[summary]: #summary

This proposes a constant matrix type in stan that will allow for significant performance opportunities. In Stan Math this will be represented by `var<Eigen::Matrix<double, R, C>>` or `var<matrix_cl<double>>`. The implementation will be fully backwards compatible with current Stan code. With optimizations from the compiler the conditions for a `matrix` to be a constant matrix can be detected and applied to Stan programs automatically for users. The implementation of this proposal suggests a staged approach where a stan language level feature is delayed until type attributes are allowed in the language and constant matrices have support for the same set of methods that the current `matrix` type has.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

var<T> -> var_value<T>

"With optimizations from the compiler the conditions for a matrix to be a constant matrix can be detected and applied to Stan programs automatically for users." -> what we've ended up working on is purely the math implementation stuff. I wouldn't mind if this design-doc shrunk to just the math stuff. At the language we'll either handle stuff automatically or introduce new types -- and likely both, but there will be some other solution.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The only mentions of the compiler/language here are to say that this design doc is just focused on the math portion. The only other example is to show what things could be deduced as static by the compiler. Should I change that one just to show how to rewrite that particular model for static matrices?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Lemme just open up a pull request to this with the changes I'm talking about and you can tell me if it's overboard. I think that'll be easier.

# Motivation
[motivation]: #motivation

Currently, a differentiable matrix in Stan is represented as an Eigen matrix holding a pointer to an underlying array of autodiff objects, each of those holding a pointer to the underlying autodiff object implementation. These `N*M` autodiff elements are expensive but necessary so the elements of a matrix can be assigned to without copying the entire matrix. However, in instances where the matrix is treated as a whole object such that underlying subsets of the matrix are never assigned to, we can store one autodiff object representing the entire matrix. See [here](https://github.com/stan-dev/design-docs/pull/21#issuecomment-625352581) for performance tests on a small subset of gradient evaluations using matrices vs. constant matrices.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"a differentiable matrix" -> "a MxN matrix" or NxM or whichever you prefer.

# Guide-level explanation
[guide-level-explanation]: #guide-level-explanation

At the Stan Math level a constant matrix is a `var_value<Eigen::Matrix<double, -1, -1>>` with an underlying pointer to a `vari_value<Eigen::Matrix<double, -1, -1>>`\*. This means that accessors for the value and adjoint of `var_value` objects can access contiguous chunks of each part of the `vari_value`. Any function that accepts a `var_value<T>` will support static matrices.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This means that accessors

We define the meaning. We're writing the code. "The value and adjoint of the matrix are stored separately in memory pointed to by the vari_value<...>"

Any function that accepts a var_value<T> will support static matrices

"Functions that currently support Eigen::Matrix<var, R, C> types will be extended to support var_value<Eigen> types"


At the Stan Math level a constant matrix is a `var_value<Eigen::Matrix<double, -1, -1>>` with an underlying pointer to a `vari_value<Eigen::Matrix<double, -1, -1>>`\*. This means that accessors for the value and adjoint of `var_value` objects can access contiguous chunks of each part of the `vari_value`. Any function that accepts a `var_value<T>` will support static matrices.

At the language and compiler level, a `matrix` can be substituted for a constant matrix if the matrix is only constructed once and it's subslices are never assigned to.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is the bit we can ditch for now imo. I think it was fair to worry about the language design at first, but I think there's a clear enough interface between how math and the language will work for this that we don't need to worry about the language decisions now.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I modified this just to talk about the compiler which I think is fair and a thing we are shooting for. I think the important thing is to make sure this doesn't cover anything we want in the language which will be a separate design doc


This feature will be added in the following stages.

1. Replace `var` and `vari` with templated types.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This part needs updated.

Vaguely we have:

  1. var has been replaced by var_value in some branch

  2. var_value and var_value types exist

  3. We agreed in a math meeting to develop this in pieces. What this means is pull requests will merge into develop before everything that will be developed is finished (https://discourse.mc-stan.org/t/monthly-math-development-meeting-06-25-2020-10-am-edt/16012/23).

It might be helpful to lay out what these pieces look like (what currently exists and what is likely to come in the next 3 months).

We'll basically be rewriting all of reverse mode to support this. Hopefully that can all go through adj_jac_apply.

It's also probably worth discussing return types. I'm still a fan of f(A, B, C) -> D, D is a var_value<Eigen> type if any of A, B, or C are Eigen<var> or var_value<Eigen> types.

Should also talk about the status of the LTO Windows/g++/clang++/Boost whatever it is

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's also probably worth discussing return types. I'm still a fan of f(A, B, C) -> D, D is a var_value<Eigen> type if any of A, B, or C are Eigen<var> or var_value<Eigen> types.

Right now I have

Functions which can take in multiple matrices will only return a constant matrix if all of the other matrix inputs as well as the return type are static matrices.

I'd rather be conservative on this for now aka having constant matrices stricly happen if all input types and the output type are going to be a constant. We can parse through a program to understand whether that's true or not. Once we run more performance tests to see if the copy of static->dynamic is expensive or not we can relax this

# Drawbacks
[drawbacks]: #drawbacks

More templates can be confusing and will lead to longer compile times. The confusion of templates can be mitigated by adding additional documentation and guides for the Stan Math type system. Larger compile times can't be avoided with this implementation, though other forthcoming proposals can allow us to monitor increases in compilation times.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think this is a substantial drawback. Such is life with templates.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it's fine to list it tho. I'm going to include the flto thing here as well


More templates can be confusing and will lead to longer compile times. The confusion of templates can be mitigated by adding additional documentation and guides for the Stan Math type system. Larger compile times can't be avoided with this implementation, though other forthcoming proposals can allow us to monitor increases in compilation times.

Waiting for type attributes in the Stan language means this feature will be a compiler optimization until both type attributes are allowed in the language and an agreement is made on stan language naming semantics. This is both a drawback and feature. While delaying a `constant_matrix` type as a user facing feature, it avoids the combinatorial problem that comes with additional type names for matrices proposed in the language such as `sparse` and `complex`.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Deferring this one.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it's fine to leave this as is

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah I think I read it wrong.

More templates can be confusing and will lead to longer compile times. The confusion of templates can be mitigated by adding additional documentation and guides for the Stan Math type system. Larger compile times can't be avoided with this implementation, though other forthcoming proposals can allow us to monitor increases in compilation times.

Waiting for type attributes in the Stan language means this feature will be a compiler optimization until both type attributes are allowed in the language and an agreement is made on stan language naming semantics. This is both a drawback and feature. While delaying a `constant_matrix` type as a user facing feature, it avoids the combinatorial problem that comes with additional type names for matrices proposed in the language such as `sparse` and `complex`.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A drawback that cost a lot of time was the LTO thing.

You should write down the LTO problem and a quick summary of all the things you did you try to work around this issue.

I remember you tried at least:

  1. boost::variants instead of polymorphism
  2. A different stack for every vari_base subclass
  3. Don't use the chaining stack for zeroing. Instead of having a chaining/non-chaining stack have a chaining/zeroing stack. Everything goes in the zeroing stack. Only chaining things go in the chaining stack.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I updated this section to list out the problem and the attempted solutions

@SteveBronder SteveBronder changed the title [WIP] Static matrices Static matrices Aug 5, 2020
@bbbales2
Copy link
Member

There's a new design doc for the new matrix autodiff type for Stan: #30 , so head over there and have a look. We're closing this pull and opening a fresh one since this one was getting very long.

We're trying to prepare this for a 2.26 release on January 25th (tracking progress for that here).

@bbbales2 bbbales2 closed this Dec 22, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

9 participants