-
-
Notifications
You must be signed in to change notification settings - Fork 186
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Eigen member function for reading fvar values and derivatives #1745
Comments
What you did not specify here is that assigning to this this expression should modify the underlying matrices. That would be the advantage over |
Im not sure what you mean by assigning to the expression, can you give me an example?
…________________________________
From: Tadej Ciglarič <notifications@github.com>
Sent: Tuesday, February 25, 2020 8:50:08 PM
To: stan-dev/math <math@noreply.github.com>
Cc: Andrew Johnson <andrew.r.johnson@postgrad.curtin.edu.au>; Author <author@noreply.github.com>
Subject: Re: [stan-dev/math] Eigen member function for constructing fvar matrix (#1745)
What you did not specify here is that assigning to this this expression should modify the underlying matrices. That would be the advantage over to_fvar.
—
You are receiving this because you authored the thread.
Reply to this email directly, view it on GitHub<#1745?email_source=notifications&email_token=AGTPCCEGR6I6UTLRTPVYRJTREUHYBA5CNFSM4K3H2EF2YY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOEM32UOY#issuecomment-590850619>, or unsubscribe<https://github.com/notifications/unsubscribe-auth/AGTPCCBVFPGGKY5WVUUHKNDREUHYBANCNFSM4K3H2EFQ>.
|
I meant the example I posted in the linked issue. I'm making a bit more complete here:
|
Oh I misunderstood what you meant in the issue, you want to create two matrices of doubles from a single matrix of fvars, not the other way around? |
Exactley. Though I also would not mind a generalized |
That makes sense, I'll look into it |
The simplest way to add it (that I can see) would be a member function like the following:
Which, borrowing your example above, would be used like:
So it would just be a single loop on the backend copying the values and derivatives into separate matrices. How does that sound? |
Yes, please. This'd be nice in the same way that adjoint-Jacobian-apply is nice for reverse mode. For forward-mode matrix derivatives, we'd take a matrix of fvars, produce the matrix of values and tangents, then perform the usual matrix hijinks to calculate values and tangents, then put the results back together into a new matrix of fvars.
… On Feb 25, 2020, at 8:22 AM, Tadej Ciglarič ***@***.***> wrote:
Exactley. Though I also would not mind a generalized to_fvar that would work the other way around.
—
You are receiving this because you are subscribed to this thread.
Reply to this email directly, view it on GitHub, or unsubscribe.
|
I was hoping for a way that would let Eigen evaluate expression in a vectorized fashion. But that might be too hard to achieve. Your suggestion has a few issues. First not all expressions support linear indexing. For example So to be general we must index matrices in two dimensions. We can either use two loops or leave order up to Eigen in a somewhat hacky solution (I did not test that. I hope it works):
I got Idea from here: https://eigen.tuxfamily.org/dox/TopicCustomizing_NullaryExpr.html |
On Feb 25, 2020, at 11:45 AM, Tadej Ciglarič ***@***.***> wrote:
I was hoping for a way that would let Eigen evaluate expression in a vectorized fashion. But that might be too hard to achieve.
I think it's hard to do that without paying the copy penalty.
You might be interested in how Autograd or JAX set this up,
which is to have matrices which you can't assign into.
Your suggestion has a few issues. First not all expressions support linear indexing. For example Eigen::Ref does not. Also I guess if you get some row major expression and col major destination linear indexing will address elements in different orders, producing wrong result.
This isn't a problem with top level `Matrix<T, -1, -1>` which has column major order, but can be a problem for template expressions.
So to be general we must index matrices in two dimensions. We can either use two loops or leave order up to Eigen in a somewhat hacky solution (I did not test that. I hope it works):
template<typename T_val, typename T_deriv>
void read_fvar(T_val& val, T_deriv& deriv) {
val = T_val::NullaryExpr(val.rows(), val.cols,
[this, &deriv](Eigen::Index i, Eigen::Index j){
deriv.coeffRef(i, j) = this->coeff(i,j).d_;
return this->coeff(i,j).val_;
});
}
If we can specialize for when linear indexing is avaialble, it's
a huge win over using binary indexing, which involves arithmetic.
|
@t4c1 Got a
|
I thought we need to return some value from the lambda in NullaryExpr. Can you give me an example how do you call that? Anyway if it works it is fine. |
Doesn't seem to be an issue, which is nice. As for usage, given a matrix of
|
Intersting. I thought all Eigen expressions use lazy evaluations. Maybe they anticiated such use case made NullaryExpr eval eagerly if its functor returns void, even if it does not seem to be documented. Anyway this makes code easyer to read than my suggestion. |
Great, I'll write some tests and update some existing functions before I open a pull with the added method |
I've put in an initial version for matrices of EigMethod (Current approach):
EigMethod_New (New method):
Overall it looks like the new method is on-par with plain looping (which is unsurprising, since that's what its doing on the backend):
The added Eigen method is below, which checks whether the input is row- or column-major and loops appropriately:
Overall, is looking good! |
That looks nice! I think Eigen nullaryExpr might do some more optimizations (single loop - linear indexing, where possible), but the benefit of that is likely small. |
Cool idea. The simd speedups can be really big if they have the right unfolding strides. Would it be possible to do the same kind of thing with methods for |
This is not actually using simd. The speedup comes from more efficient use of cache. We only do a single pass over the matrix of var. |
SSE and AVX on the chips can give you SIMD from unrolled loops that contain multiple copies of operations. Keeping the SSE and AVX pipelines full requires efficient use of cache and the right byte alignments. See the Wikipedia on SIMD. Code written specifically to allow SSE or AVX optimizations often looks like this, say, to copy an array:
It doesn't look like it'll do anything, but with SSE, AVX, etc., the compiler can target unrolling these into parallelized operations that happen 4 at a time (or even 8 at a time with the later AVX, I think). In some cases I think it can even do it automatically from a simple loop. See, for example, Alex Barnett's C++ benchmarks for n-body problems. There are examples of using loops, OpenCL, and OpenCL with loop unrolling and Agner Fogg's vector class, plus further optimizations using specialized inverse sqrt functions (one of which is implemented in Eigen). |
Our case is not that simple - we are not copying consecutive memory locations. We are also acessing values trough different pointers. I doubt compiler can deduce they point to consecutive locations. Even if it could, we have interleaved |
Right. That's why I was wondering if it'd still be useful. In practice, the |
Yeah. That is why this can never be as efficient as what we can do once we have those static matrices (#1805). |
@t4c1 Thanks! I tried the @bob-carpenter I ran the program through g++ with the flag
Which I think is indicating pointer overlap (where multiple pointers point to the same location in memory), which inhibits auto vectorisation (more info here) |
That option I'd guess it means the pointers could overlap, not that they necessarily do, because that info probably isn't available through static (compile time) analysis. The comment seems to indicate that, too: "can't determine dependence between 348->val and *_355" |
Thats the most verbose of the flags, they have a bunch of others for specific optimizations as well: https://www.codingame.com/playgrounds/283/sse-avx-vectorization/autovectorization |
@andrjohns I got your Since you already have benchmark, I would ask you to also benchmark this implementation. It might be a bit faster, since in most cases no index artihmetic is required. |
Nice! I hadn't thought of getting it to evaluate that way. I added it to the benchmark and there is a slight performance edge, but it's pretty small. However the NullaryExpr would likely generalise to expressions better than my version.
|
Nice! This seems ready for a PR. Will you do it, or should I? |
Actually looking at results again differences between your and my implementations seem almost negligible (and a bit random). I still prefere |
Yeah I'm in favour of For matrices of
Or is there a better way to handle this? |
I don't know of any case where we need |
It gets used a bit in the Which makes me think that the naming should be |
Description
As first raised in this issue it would be useful to have an Eigen function for reading a matrix of
fvar<T>
into two matrices of typeT
. By implementing this through Eigen, it can be achieved through traversing the matrix only once and possibly take advantage of Eigen's SIMD vectorisation facilitiesThis would be used like the following:
Current Version:
v3.1.0
The text was updated successfully, but these errors were encountered: