Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

add prim and rev gp_exp_cov #859

Closed
wants to merge 16 commits into from

Conversation

drezap
Copy link
Collaborator

@drezap drezap commented May 9, 2018

Submission Checklist

  • Run unit tests: ./runTests.py test/unit
  • Run cpplint: make cpplint
  • Declare copyright holder and open-source license: see below

Summary:

Add exponential covariance function. ARD support in prim.

rev supported for one vector input. After tests, if I want two vector input faster, I'll go ahead and add it.

How to Verify:

Tests:
Prim: vec, rvec, all cominations of, std::vector, as well as message throws, etc.
Rev: gradients, and vector double and all combinations of, as well as error throws and checking that the vari's are on the stack.

Copyright and Licensing

Please list the copyright holder for the work you are submitting (this will be you or your assignee, such as a university or company):
Copyright<2018>

By submitting this pull request, the copyright holder is agreeing to license the submitted work under the following licenses:

@drezap
Copy link
Collaborator Author

drezap commented May 9, 2018

Hi - can someone please explain to me why this is failing? or even better, how do make my local compiler throw the same errors so I can debug this myself? This passes tests/compiles locally.

I've parsed through the library and found usages of both ChainableStack::context().memalloc_
and ChainableStack::memalloc_ and neither seem to be passing the tests here.

@wds15
Copy link
Contributor

wds15 commented May 10, 2018

You should use ChainableStack::context().memalloc_ to access the AD stack.

@drezap
Copy link
Collaborator Author

drezap commented May 11, 2018

Hi - can someone please review this pull request? Thanks!

@syclik
Copy link
Member

syclik commented May 11, 2018 via email

@syclik
Copy link
Member

syclik commented May 11, 2018

@drezap, can you add the copyright holder for this pull request?

@drezap
Copy link
Collaborator Author

drezap commented May 11, 2018

@syclik updated

@syclik
Copy link
Member

syclik commented May 11, 2018

@drezap, sorry -- it doesn't look like it's updated. Where it says "Copyright<2018>" can you replace that with your name? Or whoever owns the copyright on this code?

Copy link
Member

@syclik syclik left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'll start with these comments. I need to dig a little deeper into the rev part of this.

namespace stan {
namespace math {

/** Returns a Matern exponential covariance matrix with one input vector
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Style: start the comment on the next line. This is for consistency with other doc.

/**
 * Returns a Matern ...

*
* @param x std::vector of elements that can be used in stan::math::distance
* @param length_scale length scale
* @param sigma standard deviation that can be used in stan::math::square
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please reorder these to be consistent with code.

*
* where \f$ d(x, x') \f$ is the squared distance, or the dot product.
*
* See Rausmussen & Williams et al 2006 Chapter 4.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm looking at equation (4.18) in RW and not seeing the sigma term in there. Did you include this for convenience?

@avehtari, would you mind commenting on two things:

  1. is this the right name for this function?
  2. is this an appropriate parameterization for inclusion in Stan?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@syclik see Betancourt's post on discourse: http://discourse.mc-stan.org/t/adding-gaussian-process-covariance-functions/237/42?u=drezap

I am following the conventions there.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for pointing out the discussion. One thing that it's missing is the jitter term (also referred to as nugget).

I'm going to kick this question over to @avehtari. Specifically, Aki, what's the most common form of this kernel?

@drezap, if we stick to this, I'd just like to see the note indicate that we're not using R&W's parameterization as-is.

check_not_nan("gp_exponential_cov", "x", x[n]);

check_positive("gp_exponential_cov", "marginal variance", sigma);
check_not_nan("gp_exponential_cov", "marginal variance", sigma);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

use check_positive_finite() instead of check_positive() and check_not_nan()

check_not_nan("gp_exponential_cov", "marginal variance", sigma);

check_positive("gp_exponential_cov", "length-scale", length_scale);
check_not_nan("gp_exponential_cov", "length-scale", length_scale);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

use check_positive_finite()

boost::is_same<typename scalar_type<T_x>::type, double>::value,
Eigen::Matrix<var, -1, -1>>::type
gp_exponential_cov(const std::vector<T_x> &x, const var &sigma, const var &l) {
check_positive("gp_exponential_cov", "sigma", sigma);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

check_positive_finite()?

* @tparam T_sigma type of sigma
* @tparam T_l type of length scale
*/
template <typename T_x, typename T_s, typename T_l>
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Questions:

  1. any reason this is templated? It looks like the code only has one possible rev instantiation: <double, var, var>.
  2. what's with the other two specializations? Just based on inspecting the code, it looks like there's going to be trouble with template argument deduction. It's got to be ambiguous the way it's written: <T_x, double, T_l> and <T_x, T_s, double>. I guess I'm saying I don't believe that it'll compile if you actually tried to have both in there.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why are there two specializations in stan/math/rev/mat/fun/cov_exp_quad.hpp? OK, let me find argument deductions where it fails, and I'll most likely remove the other two specializations.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There are two different partial template specializations in the rev version of cov_exp_quad.

L215 has: cov_exp_quad<double, var, var>
L257 has: cov_exp_quad<double, double, var>

That's why there is a base template function and then a partial specialization of class cov_exp_quad_vari. Now that I look at it, there's probably a different way to implement it, but that's a different question.

EXPECT_EQ(3, cov.rows());
EXPECT_EQ(4, cov.cols());
for (int i = 0; i < 3; i++) {
for (int j = 0; j < 3; j++) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The upper limit for j should be 4 instead of 3.

= stan::math::gp_exponential_cov(x2_vec, x1_rvec, sigma, l));
EXPECT_EQ(4, cov7.rows());
EXPECT_EQ(3, cov7.cols());
for (int i = 0; i < 3; i++) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

upper limit for i should be 4.

}
}

TEST(MathPrimMat, vec_eigen_mixed_gp_exponential_cov2) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think we really need this to do mixed row vectors vs vectors. Do you?

@syclik
Copy link
Member

syclik commented May 11, 2018

@drezap, thanks! I started the review. I need more time to look into the rev part of it.

I had a question for when we have x1 and x2. Do we really need to allow for it to allow mixed eigen vectors and eigen row vectors? I don't think it really needs to and that would simplify testing a lot.

This will affect the language, so perhaps @bob-carpenter can confirm. Bob, I think we only need to support this (Stan code):

vector[N1] x1;
vector[N2] x2;
matrix[N1, N2] cov = gp_exponential_cov(x1, x2, sigma, l);

or

row_vector[N1] x1;
row_vector[N2] x2;
matrix[N1, N2] cov = gp_exponential_cov(x1, x2, sigma, l);

(and maybe with both of them real arrays)

but... I don't think we need to support:

vector[N1] x1;
row_vector[N2] x2;
matrix[N1, N2] cov = gp_exponential_cov(x1, x2, sigma, l);

(and with the types switched)

Mind confirming?

@syclik
Copy link
Member

syclik commented May 12, 2018 via email

@syclik
Copy link
Member

syclik commented May 12, 2018

I just checked. No inverse. What you described is great.

@avehtari
Copy link

I have time on Monday to check the questions you've asked me, and then @drezap will be at Aalto, so we can look at this together.

Copy link
Member

@syclik syclik left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It looks like we're still a bit off with the implementation, at least the stuff that's specialized for reverse mode.

* @tparam T_x type of std::vector of elements
* @tparam T_sigma type of sigma
*/
template <typename T_x, typename T_s>
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This isn't used, so remove.

* @param sigma standard deviation
* @param length_scale length scale
*/
gp_exponential_cov_vari(const std::vector<T_x> &x, double sigma,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This isn't used.

l_d_(value_of(length_scale)),
sigma_d_(value_of(sigma)),
sigma_sq_d_(sigma_d_ * sigma_d_),
dist_(ChainableStack::instance().memalloc_.alloc_array<double>(
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

dist_ really need to go in the arena memory. We should initialize a std::vector<double> with that size. (I realize the other kernel does that; I think we shouldn't here and there. Let's start with this one.)

double adjsigma = 0;
for (size_t i = 0; i < size_ltri_; ++i) {
vari *el_low = cov_lower_[i];
double prod_add = el_low->adj_ * el_low->val_;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

for readability, can you replace el_low with cov_lower_[i]? It's only used in the next line. The compiler should be able to figure out how to optimize that.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

you're saying instead of a pointer to a vari just use the variable cov_lower_[i] itself? cool

for (size_t i = 0; i < size_; ++i)
cov_diag_[i] = new vari(sigma_sq_d_, false);
}
virtual void chain() {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Mind writing out the derivative here in the doc?

for (size_t i = 0; i < size_ltri_; ++i) {
vari *el_low = cov_lower_[i];
double prod_add = el_low->adj_ * el_low->val_;
adjl += prod_add * dist_[i];
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd swap lines 99 and 100 for readability. Here's what I wrote before I understood that you were reusing the term:

I think it makes sense to have line 98 and 99 together. If I understand the math, it's cov_lower_[i].val_ * dist_[i] is the derivative of the gp_exponential_cov() and cov_lower_[i].adj_ is the adjoint. It just makes more sense to keep the val_ * dist_ term grouped.

@drezap
Copy link
Collaborator Author

drezap commented May 21, 2018

@syclik thank you for the helpful comments, I'll address in my next push to the math library. I'm changing calculation of the kernel (to match GPML, so the gradients with change).

@syclik
Copy link
Member

syclik commented May 24, 2018

Thanks. Mind pinging me after you make changes? (I don't know if I'll get notified when you do.)

@drezap
Copy link
Collaborator Author

drezap commented May 24, 2018 via email

@drezap
Copy link
Collaborator Author

drezap commented May 31, 2018

closing until ready

@drezap drezap closed this May 31, 2018
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

6 participants