Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix lbeta for large arguments #1612

Merged

Conversation

martinmodrak
Copy link
Contributor

@martinmodrak martinmodrak commented Jan 14, 2020

Summary

Reimplementing lbeta to be stable when one argument is large and other small.

It is a neat trick I found in the R codebase - you first compute an approximation to lbeta via Stirling approximation to lgamma which allows you to analytically manipulate the expression and avoid catastrophic cancellation for the "bulk" of the value. Then you compute the difference between lgamma and the Stirling approximation (the function lgamma_stirling_diff) and add this to the result to restore precision.

The code in R for the difference between lgamma and Stirling was opaque and used some tricks I didn't really get, but it turns out just using 3 additional terms of the Stirling series as found on Wikipedia is enough for all the tests I could come up with :-)

Possible issues:

  • Not sure if lgamma_stirling_diff is a good name for the function.
  • The R code that inspired this implementation (https://github.com/wch/r-source/blob/5a156a0865362bb8381dcd69ac335f5174a4f60c/src/nmath/lbeta.c) is GPLv2. While I reimplemented the code from scratch, some portions of lbeta.hpp are quite similar to the R code as there are only so many ways you can rearrange the Stirling approximation to get stable results. I believe this should not be a copyright issue, but I am not a lawyer.

Tests

  • A test against values and derivatives precomputed in Mathematica
  • A test against several identities
  • A test for extreme values, infinity
  • A test for stability of value around the cutoff for using the Stirling approximation

Side Effects

Hopefully none. I have not done any performance comparison of the old vs. new implementation. I can imagine the new is faster (can avoid calls to lgamma) or slower (needs quite a bunch of arithmetic operations).

I did not touch the way gradients are computed (as those look mostly OK, although for very small arguments I needed to increase the test tolerance to 1e-7).

Checklist

By submitting this pull request, the copyright holder is agreeing to the license the submitted work under the following licenses:
- Code: BSD 3-clause (https://opensource.org/licenses/BSD-3-Clause)
- Documentation: CC-BY 4.0 (https://creativecommons.org/licenses/by/4.0/)

  • the basic tests are passing

    • unit tests pass (to run, use: ./runTests.py test/unit)
    • header checks pass, (make test-headers)
    • docs build, (make doxygen)
    • code passes the built in C++ standards checks (make cpplint)
  • the code is written in idiomatic C++ and changes are documented in the doxygen

  • the new changes are tested

@martinmodrak
Copy link
Contributor Author

In #1579 @bob-carpenter inquired about Mathematica code. I will address it here, as this PR needs to be merged before #1579.

Also, have you checked that the IP is OK for us to include Mathematica code in comments and use the output of their approximations? I can't see why it wouldn't be, but it'd be good to check before we go too far down the road of using a proprietary software package.

I am not a lawyer, but the code is IMHO easy: I've written the code, I (or my employer) have the IP - this has nothing to do with what you do with the code after it's written.

For ouptut of software, the case is less clear cut, but AFAIK the important part is who did a "creative act" in creating the outputs. Referring to for example this source (but others agree): If the user provides non-trivial inputs to a software, they can usually claim copyright to its outputs (the same way Microsoft cannot claim copyright over stuff written in Word / Excel or Adobe cannot claim copyright to images created in Illustrator). In some cases, just running a software does not give you copyright over the results - for example for software with no inputs designed to produce the outputs. While there is an interesting little explored field over stuff like AI-created artwork, I think we are safely on the good side of the law - my input to Mathematica was creative and non-trivial.

Copy link
Contributor

@mcol mcol left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for finding the problem and coming up with a proper solution. I've read through this and left you a bunch of nitpicks and a few style improvements. I don't feel confident enough to properly review this, so request a review from somebody more experienced. Hopefully, if you address my comments, they'll have an easier time reviewing the actual, important parts.

stan/math/prim/fun/lbeta.hpp Outdated Show resolved Hide resolved
stan/math/prim/fun/lbeta.hpp Outdated Show resolved Hide resolved
stan/math/prim/fun/lbeta.hpp Outdated Show resolved Hide resolved
stan/math/prim/fun/lbeta.hpp Show resolved Hide resolved
stan/math/prim/fun/lbeta.hpp Show resolved Hide resolved
test/unit/math/prim/fun/lbeta_test.cpp Outdated Show resolved Hide resolved
test/unit/math/rev/fun/lbeta_test.cpp Show resolved Hide resolved
stan/math/prim/fun/lgamma_stirling_diff.hpp Outdated Show resolved Hide resolved
constexpr double NaN = std::numeric_limits<double>::quiet_NaN();
// Test values generated in Mathematice, reproducible notebook at
// https://www.wolframcloud.com/obj/martin.modrak/Published/lbeta.nb
// Mathematica Code reproduced below for convenience:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't have a strong opinion, but perhaps this should be recorded in the issue rather than committed here.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I didn't realize putting it into an issue is an option, that's a good point! Still I slightly prefer having it in code - it can evolve with the code and so it might make sense to version it along the code. And the code might be around longer than GitHub...

@martinmodrak
Copy link
Contributor Author

Thanks @mcol for the suggestions, I've implemented all of them and learned a bit more about Stan's style.

@bob-carpenter
Copy link
Contributor

bob-carpenter commented Jan 14, 2020 via email

Copy link
Contributor

@mcol mcol left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A few more tiny changes, then you should tag somebody for review.

*/
template <typename T>
return_type_t<T> lgamma_stirling(const T x) {
return 0.5 * LOG_TWO_PI + (x - 0.5) * log(x) - x;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Here I meant that HALF_LOG_TWO_PI is in constants.hpp and can be used already.

stan/math/prim/fun/lgamma_stirling_diff.hpp Outdated Show resolved Hide resolved
stan/math/prim/fun/lgamma_stirling_diff.hpp Outdated Show resolved Hide resolved
stan/math/prim/fun/lgamma_stirling_diff.hpp Outdated Show resolved Hide resolved
EXPECT_NEAR(diff_before, diff_after, tol) << "diff around useful cutoff";
}

namespace lgamma_stirling_diff_test_internal {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There's no need to namespace this in a test file.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've heard there is a plan to run all unit tests in a single translation unit to save on compilation time (https://discourse.mc-stan.org/t/requiring-global-unique-names-for-unit-tests-in-stan-math/11270). Since I do have TestValue and testValues in other tests, this seemed like a reasonable precaution.

stan/math/prim/fun/lgamma_stirling.hpp Outdated Show resolved Hide resolved
@martinmodrak
Copy link
Contributor Author

Thanks @mcol for the suggestions, I've implemented them, except for the test namespace (see the comment).

I would tag @bbbales2 for review (but this does not need to make it into the upcoming release, so no rush).

@stan-buildbot
Copy link
Contributor


Name Old Result New Result Ratio Performance change( 1 - new / old )
gp_pois_regr/gp_pois_regr.stan 4.91 4.95 0.99 -0.72% slower
low_dim_corr_gauss/low_dim_corr_gauss.stan 0.02 0.02 0.98 -1.61% slower
eight_schools/eight_schools.stan 0.09 0.09 1.04 4.24% faster
gp_regr/gp_regr.stan 0.22 0.24 0.95 -5.78% slower
irt_2pl/irt_2pl.stan 6.08 6.12 0.99 -0.67% slower
performance.compilation 87.82 87.27 1.01 0.63% faster
low_dim_gauss_mix_collapse/low_dim_gauss_mix_collapse.stan 7.39 7.32 1.01 1.07% faster
pkpd/one_comp_mm_elim_abs.stan 22.22 20.29 1.1 8.69% faster
sir/sir.stan 104.41 104.95 0.99 -0.52% slower
gp_regr/gen_gp_data.stan 0.04 0.05 0.97 -2.93% slower
low_dim_gauss_mix/low_dim_gauss_mix.stan 3.0 2.96 1.01 1.28% faster
pkpd/sim_one_comp_mm_elim_abs.stan 0.31 0.31 1.0 -0.22% slower
arK/arK.stan 1.74 1.74 1.0 0.04% faster
arma/arma.stan 0.78 0.8 0.97 -3.3% slower
garch/garch.stan 0.59 0.59 1.0 -0.08% slower
Mean result: 1.00113690293

Jenkins Console Log
Blue Ocean
Commit hash: 17f3680


Machine information ProductName: Mac OS X ProductVersion: 10.11.6 BuildVersion: 15G22010

CPU:
Intel(R) Xeon(R) CPU E5-1680 v2 @ 3.00GHz

G++:
Configured with: --prefix=/Applications/Xcode.app/Contents/Developer/usr --with-gxx-include-dir=/usr/include/c++/4.2.1
Apple LLVM version 7.0.2 (clang-700.1.81)
Target: x86_64-apple-darwin15.6.0
Thread model: posix

Clang:
Apple LLVM version 7.0.2 (clang-700.1.81)
Target: x86_64-apple-darwin15.6.0
Thread model: posix

@stan-buildbot
Copy link
Contributor


Name Old Result New Result Ratio Performance change( 1 - new / old )
gp_pois_regr/gp_pois_regr.stan 4.89 4.99 0.98 -2.09% slower
low_dim_corr_gauss/low_dim_corr_gauss.stan 0.02 0.02 0.98 -1.9% slower
eight_schools/eight_schools.stan 0.09 0.09 0.98 -2.23% slower
gp_regr/gp_regr.stan 0.22 0.22 0.99 -0.53% slower
irt_2pl/irt_2pl.stan 6.08 6.09 1.0 -0.1% slower
performance.compilation 88.02 86.93 1.01 1.24% faster
low_dim_gauss_mix_collapse/low_dim_gauss_mix_collapse.stan 7.31 7.31 1.0 -0.02% slower
pkpd/one_comp_mm_elim_abs.stan 21.07 20.66 1.02 1.93% faster
sir/sir.stan 96.44 93.63 1.03 2.92% faster
gp_regr/gen_gp_data.stan 0.05 0.05 0.98 -2.24% slower
low_dim_gauss_mix/low_dim_gauss_mix.stan 2.97 2.96 1.0 0.4% faster
pkpd/sim_one_comp_mm_elim_abs.stan 0.31 0.31 1.0 0.04% faster
arK/arK.stan 1.75 1.75 1.0 -0.11% slower
arma/arma.stan 0.81 0.81 1.0 -0.38% slower
garch/garch.stan 0.63 0.63 1.01 0.53% faster
Mean result: 0.998517831787

Jenkins Console Log
Blue Ocean
Commit hash: 084ddfb


Machine information ProductName: Mac OS X ProductVersion: 10.11.6 BuildVersion: 15G22010

CPU:
Intel(R) Xeon(R) CPU E5-1680 v2 @ 3.00GHz

G++:
Configured with: --prefix=/Applications/Xcode.app/Contents/Developer/usr --with-gxx-include-dir=/usr/include/c++/4.2.1
Apple LLVM version 7.0.2 (clang-700.1.81)
Target: x86_64-apple-darwin15.6.0
Thread model: posix

Clang:
Apple LLVM version 7.0.2 (clang-700.1.81)
Target: x86_64-apple-darwin15.6.0
Thread model: posix

@martinmodrak
Copy link
Contributor Author

Pinging @bbbales2 - do you have time for review or should I ask someone else? Thx!

@stan-buildbot
Copy link
Contributor


Name Old Result New Result Ratio Performance change( 1 - new / old )
gp_pois_regr/gp_pois_regr.stan 4.88 4.84 1.01 0.91% faster
low_dim_corr_gauss/low_dim_corr_gauss.stan 0.02 0.02 0.99 -0.62% slower
eight_schools/eight_schools.stan 0.09 0.09 0.97 -3.11% slower
gp_regr/gp_regr.stan 0.23 0.22 1.0 0.09% faster
irt_2pl/irt_2pl.stan 6.17 6.08 1.01 1.45% faster
performance.compilation 88.83 86.86 1.02 2.22% faster
low_dim_gauss_mix_collapse/low_dim_gauss_mix_collapse.stan 7.31 7.33 1.0 -0.24% slower
pkpd/one_comp_mm_elim_abs.stan 21.74 20.37 1.07 6.3% faster
sir/sir.stan 98.47 98.31 1.0 0.17% faster
gp_regr/gen_gp_data.stan 0.04 0.04 0.99 -0.67% slower
low_dim_gauss_mix/low_dim_gauss_mix.stan 2.97 2.97 1.0 0.16% faster
pkpd/sim_one_comp_mm_elim_abs.stan 0.31 0.32 0.97 -2.62% slower
arK/arK.stan 1.78 1.76 1.01 0.89% faster
arma/arma.stan 0.8 0.8 1.0 -0.08% slower
garch/garch.stan 0.58 0.58 0.99 -0.67% slower
Mean result: 1.00324490478

Jenkins Console Log
Blue Ocean
Commit hash: 71b983e


Machine information ProductName: Mac OS X ProductVersion: 10.11.6 BuildVersion: 15G22010

CPU:
Intel(R) Xeon(R) CPU E5-1680 v2 @ 3.00GHz

G++:
Configured with: --prefix=/Applications/Xcode.app/Contents/Developer/usr --with-gxx-include-dir=/usr/include/c++/4.2.1
Apple LLVM version 7.0.2 (clang-700.1.81)
Target: x86_64-apple-darwin15.6.0
Thread model: posix

Clang:
Apple LLVM version 7.0.2 (clang-700.1.81)
Target: x86_64-apple-darwin15.6.0
Thread model: posix

Copy link
Member

@bbbales2 bbbales2 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good!

@bbbales2 bbbales2 merged commit a46427f into stan-dev:develop Jan 28, 2020
@bbbales2
Copy link
Member

Also, I asked Bob about:

While I reimplemented the code from scratch, some portions of lbeta.hpp are quite similar to the R code as there are only so many ways you can rearrange the Stirling approximation to get stable results.

He said algorithms can't be copyrighted, so we're good.

@bob-carpenter
Copy link
Contributor

bob-carpenter commented Jan 28, 2020 via email

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

6 participants