-
-
Notifications
You must be signed in to change notification settings - Fork 183
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
add prim and rev gp_exp_cov #859
Conversation
Hi - can someone please explain to me why this is failing? or even better, how do make my local compiler throw the same errors so I can debug this myself? This passes tests/compiles locally. I've parsed through the library and found usages of both |
You should use |
…stable/2017-11-14)
…stable/2017-11-14)
Hi - can someone please review this pull request? Thanks! |
@drezap, sorry for the delay. I'll review it today.
…On Fri, May 11, 2018 at 7:02 AM, drezap ***@***.***> wrote:
Hi - can someone please review this pull request? Thanks!
—
You are receiving this because you are subscribed to this thread.
Reply to this email directly, view it on GitHub
<#859 (comment)>, or mute
the thread
<https://github.com/notifications/unsubscribe-auth/AAZ_FxnsvEsRxwPnY2pvoost4qa_Ru_Qks5txW_egaJpZM4T3m2o>
.
|
@drezap, can you add the copyright holder for this pull request? |
@syclik updated |
@drezap, sorry -- it doesn't look like it's updated. Where it says "Copyright<2018>" can you replace that with your name? Or whoever owns the copyright on this code? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'll start with these comments. I need to dig a little deeper into the rev
part of this.
namespace stan { | ||
namespace math { | ||
|
||
/** Returns a Matern exponential covariance matrix with one input vector |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Style: start the comment on the next line. This is for consistency with other doc.
/**
* Returns a Matern ...
* | ||
* @param x std::vector of elements that can be used in stan::math::distance | ||
* @param length_scale length scale | ||
* @param sigma standard deviation that can be used in stan::math::square |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please reorder these to be consistent with code.
* | ||
* where \f$ d(x, x') \f$ is the squared distance, or the dot product. | ||
* | ||
* See Rausmussen & Williams et al 2006 Chapter 4. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm looking at equation (4.18) in RW and not seeing the sigma
term in there. Did you include this for convenience?
@avehtari, would you mind commenting on two things:
- is this the right name for this function?
- is this an appropriate parameterization for inclusion in Stan?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@syclik see Betancourt's post on discourse: http://discourse.mc-stan.org/t/adding-gaussian-process-covariance-functions/237/42?u=drezap
I am following the conventions there.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for pointing out the discussion. One thing that it's missing is the jitter
term (also referred to as nugget
).
I'm going to kick this question over to @avehtari. Specifically, Aki, what's the most common form of this kernel?
@drezap, if we stick to this, I'd just like to see the note indicate that we're not using R&W's parameterization as-is.
check_not_nan("gp_exponential_cov", "x", x[n]); | ||
|
||
check_positive("gp_exponential_cov", "marginal variance", sigma); | ||
check_not_nan("gp_exponential_cov", "marginal variance", sigma); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
use check_positive_finite()
instead of check_positive()
and check_not_nan()
check_not_nan("gp_exponential_cov", "marginal variance", sigma); | ||
|
||
check_positive("gp_exponential_cov", "length-scale", length_scale); | ||
check_not_nan("gp_exponential_cov", "length-scale", length_scale); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
use check_positive_finite()
boost::is_same<typename scalar_type<T_x>::type, double>::value, | ||
Eigen::Matrix<var, -1, -1>>::type | ||
gp_exponential_cov(const std::vector<T_x> &x, const var &sigma, const var &l) { | ||
check_positive("gp_exponential_cov", "sigma", sigma); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
check_positive_finite()
?
* @tparam T_sigma type of sigma | ||
* @tparam T_l type of length scale | ||
*/ | ||
template <typename T_x, typename T_s, typename T_l> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Questions:
- any reason this is templated? It looks like the code only has one possible rev instantiation:
<double, var, var>
. - what's with the other two specializations? Just based on inspecting the code, it looks like there's going to be trouble with template argument deduction. It's got to be ambiguous the way it's written:
<T_x, double, T_l>
and<T_x, T_s, double>
. I guess I'm saying I don't believe that it'll compile if you actually tried to have both in there.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why are there two specializations in stan/math/rev/mat/fun/cov_exp_quad.hpp
? OK, let me find argument deductions where it fails, and I'll most likely remove the other two specializations.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There are two different partial template specializations in the rev
version of cov_exp_quad
.
L215 has: cov_exp_quad<double, var, var>
L257 has: cov_exp_quad<double, double, var>
That's why there is a base template function and then a partial specialization of class cov_exp_quad_vari
. Now that I look at it, there's probably a different way to implement it, but that's a different question.
EXPECT_EQ(3, cov.rows()); | ||
EXPECT_EQ(4, cov.cols()); | ||
for (int i = 0; i < 3; i++) { | ||
for (int j = 0; j < 3; j++) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The upper limit for j
should be 4
instead of 3
.
= stan::math::gp_exponential_cov(x2_vec, x1_rvec, sigma, l)); | ||
EXPECT_EQ(4, cov7.rows()); | ||
EXPECT_EQ(3, cov7.cols()); | ||
for (int i = 0; i < 3; i++) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
upper limit for i
should be 4
.
} | ||
} | ||
|
||
TEST(MathPrimMat, vec_eigen_mixed_gp_exponential_cov2) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think we really need this to do mixed row vectors vs vectors. Do you?
@drezap, thanks! I started the review. I need more time to look into the rev part of it. I had a question for when we have This will affect the language, so perhaps @bob-carpenter can confirm. Bob, I think we only need to support this (Stan code):
or
(and maybe with both of them but... I don't think we need to support:
(and with the types switched) Mind confirming? |
Whoops. You’re absolutely right about the math.
I think we may have an inverse function that applies to std vector. I’ll have to double check. If we do, then I think it’s still preferable to do sum(inverse(l)). The reason is that if a specialized version exists in the library, it’ll be faster.
… On May 12, 2018, at 3:22 AM, drezap ***@***.***> wrote:
@drezap commented on this pull request.
In stan/math/prim/mat/fun/gp_exponential_cov.hpp:
> + Eigen::Matrix<typename stan::return_type<T_x, T_s, T_l>::type, Eigen::Dynamic,
+ Eigen::Dynamic>
+ cov(x_size, x_size);
+
+ if (x_size == 0)
+ return cov;
+
+ T_s sigma_sq = square(sigma);
+ T_l temp;
+
+ for (size_t i = 0; i < (x_size - 1); ++i) {
+ cov(i, i) = sigma_sq;
+ for (size_t j = i + 1; j < x_size; ++j) {
+ temp = 0;
+ for (size_t k = 0; k < l_size; ++k)
+ temp += squared_distance(x[i], x[j]) / length_scale[k];
So I don't think this is equivalent, unfortunately: i.e. 1/sum(length_scale) != sum_{k=1}^K 1/length_scale[k]. Consider this counter example: Let length_scale = 2, 3. => 1/sum(length_scale) = 1/5, but really, we need 1/2 + 1/3 = 5/6. In general this is not true: 1/(a + b) == 1/a + 1/b. Equivalently, run this code in R: 1 / sum(.1 + .2) == 1/.1 + 1/.2, or something similar.
In a lot of docs for ARD they are careful to index by k for the d(x, x')_k, so I'm pretty sure the latter is correct.
I can do temp += 1.0 / length_scale[k] and then after the for loop do squared_distance(x[i], x[j]) * temp, so I'm not calling squared_distance over and over. This response was a bit pedantic, sorry.
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub, or mute the thread.
|
I just checked. No inverse. What you described is great. |
I have time on Monday to check the questions you've asked me, and then @drezap will be at Aalto, so we can look at this together. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It looks like we're still a bit off with the implementation, at least the stuff that's specialized for reverse mode.
* @tparam T_x type of std::vector of elements | ||
* @tparam T_sigma type of sigma | ||
*/ | ||
template <typename T_x, typename T_s> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This isn't used, so remove.
* @param sigma standard deviation | ||
* @param length_scale length scale | ||
*/ | ||
gp_exponential_cov_vari(const std::vector<T_x> &x, double sigma, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This isn't used.
l_d_(value_of(length_scale)), | ||
sigma_d_(value_of(sigma)), | ||
sigma_sq_d_(sigma_d_ * sigma_d_), | ||
dist_(ChainableStack::instance().memalloc_.alloc_array<double>( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
dist_
really need to go in the arena memory. We should initialize a std::vector<double>
with that size. (I realize the other kernel does that; I think we shouldn't here and there. Let's start with this one.)
double adjsigma = 0; | ||
for (size_t i = 0; i < size_ltri_; ++i) { | ||
vari *el_low = cov_lower_[i]; | ||
double prod_add = el_low->adj_ * el_low->val_; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
for readability, can you replace el_low
with cov_lower_[i]
? It's only used in the next line. The compiler should be able to figure out how to optimize that.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
you're saying instead of a pointer to a vari just use the variable cov_lower_[i]
itself? cool
for (size_t i = 0; i < size_; ++i) | ||
cov_diag_[i] = new vari(sigma_sq_d_, false); | ||
} | ||
virtual void chain() { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Mind writing out the derivative here in the doc?
for (size_t i = 0; i < size_ltri_; ++i) { | ||
vari *el_low = cov_lower_[i]; | ||
double prod_add = el_low->adj_ * el_low->val_; | ||
adjl += prod_add * dist_[i]; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd swap lines 99 and 100 for readability. Here's what I wrote before I understood that you were reusing the term:
I think it makes sense to have line 98 and 99 together. If I understand the math, it's
cov_lower_[i].val_ * dist_[i]
is the derivative of thegp_exponential_cov()
andcov_lower_[i].adj_
is the adjoint. It just makes more sense to keep theval_ * dist_
term grouped.
@syclik thank you for the helpful comments, I'll address in my next push to the math library. I'm changing calculation of the kernel (to match GPML, so the gradients with change). |
Thanks. Mind pinging me after you make changes? (I don't know if I'll get notified when you do.) |
@syclik yep
ᐧ
Best,
Andre Zapico
…On Thu, May 24, 2018 at 4:42 AM, Daniel Lee ***@***.***> wrote:
Thanks. Mind pinging me after you make changes? (I don't know if I'll get
notified when you do.)
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
<#859 (comment)>, or mute
the thread
<https://github.com/notifications/unsubscribe-auth/ALHebIkhJxTSnoeWEoKb9-60Qn0lx3dJks5t1g_ogaJpZM4T3m2o>
.
|
closing until ready |
Submission Checklist
./runTests.py test/unit
make cpplint
Summary:
Add exponential covariance function. ARD support in prim.
rev
supported for one vector input. After tests, if I want two vector input faster, I'll go ahead and add it.How to Verify:
Tests:
Prim: vec, rvec, all cominations of, std::vector, as well as message throws, etc.
Rev: gradients, and vector double and all combinations of, as well as error throws and checking that the vari's are on the stack.
Copyright and Licensing
Please list the copyright holder for the work you are submitting (this will be you or your assignee, such as a university or company):
Copyright<2018>
By submitting this pull request, the copyright holder is agreeing to license the submitted work under the following licenses: