Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Using local_nested_autodiff for all instances of nested autodiff #1706

Merged

Conversation

martinmodrak
Copy link
Contributor

@martinmodrak martinmodrak commented Feb 12, 2020

Summary

Implemented a RAII-style class local_nested_autodiff (are we happy with the name?) that makes it easier to ensure that nested stack is correctly recovered on leaving of the scope. The class also has a set_zero_all_adjoints member, so that one can handle everything around nested autodiff via this class and without using global function calls.

Removed all uses of start_nested, recover_memory_nested and set_zero_all_adjoints_nested and use local_nested_autodiff instead. The only exception are some of the tests of the autodiff where those functions are tested separately.

The class itself is just a thin wrapper around the original calls.

Tests

A new test was written that replicates the tests for nested autodiff using global functions with using local_nested_autodiff.

I also moved the gradable struct used in rev/core/var_test.cpp to its own header so that I can reuse it.

Side Effects

Hopefully no. However I am worried that some of the functions I touched appear to have no tests.

Checklist

  • Math issue Use RAII for nested autodiff #1703

  • Copyright holder: Institute of Microbiology of the Czech Academy of Sciences

    The copyright holder is typically you or your assignee, such as a university or company. By submitting this pull request, the copyright holder is agreeing to the license the submitted work under the following licenses:
    - Code: BSD 3-clause (https://opensource.org/licenses/BSD-3-Clause)
    - Documentation: CC-BY 4.0 (https://creativecommons.org/licenses/by/4.0/)

  • the basic tests are passing

    • unit tests pass (to run, use: ./runTests.py test/unit)
    • header checks pass, (make test-headers)
    • dependencies checks pass, (make test-math-dependencies)
    • docs build, (make doxygen)
    • code passes the built in C++ standards checks (make cpplint)
  • the code is written in idiomatic C++ and changes are documented in the doxygen

  • the new changes are tested

@wds15
Copy link
Contributor

wds15 commented Feb 12, 2020

I am not sure if I would put the nested thing in the most inner loops. This creates many additional start_nested calls which we did avoid before.

What is the rationale for that?

@martinmodrak
Copy link
Contributor Author

martinmodrak commented Feb 12, 2020

I am not sure if I would put the nested thing in the most inner loops. This creates many additional start_nested calls which we did avoid before.

I definitely didn't want to change the location of any nested autodiff and aimed to put the constructor at exactly the line the start_nested was before. So if anything changed it is definitely a mistake. I briefly rechecked and it seems to me that all instances match the previous code. Could you point me the specific case where you see a problem?

@wds15
Copy link
Contributor

wds15 commented Feb 12, 2020

Argh... I misread. Matching the old order of calls is the right thing to do. Sorry for the noise. My bad.

@bob-carpenter
Copy link
Contributor

bob-carpenter commented Feb 12, 2020

@wds15 --- can you put in the formal review? It sounds like you did the audit for location, which is the most important part. I'm going to drop some comments here rather than in a review.

I'm OK with the name as is. If it's not too much trouble, I'd suggest somethihg a bit more descriptive like begin_nested_rev_autodff to indicate it's starting the scope.

Can the usage just be this

begin_nested_rev_autodiff();

instead of

begin_nested_rev_autodiff nested;

I'm not sure how unassigned classes are handled for scoping---they might be destroyed right away since it's not going to be used again. Doesn't the latter give you unused variable warnings?

The RAII class itself is very elegantly coded and documented. It's making me so happy that devs are figuring out these kinds of patterns that make the code both easier to use and more robust.

The only thing I would've required in a review is doc for the constructor and destructor, because that's where the action is.

@wds15
Copy link
Contributor

wds15 commented Feb 13, 2020

Ok, I can dive into this.

@bob-carpenter
Copy link
Contributor

bob-carpenter commented Feb 13, 2020 via email

@wds15
Copy link
Contributor

wds15 commented Feb 13, 2020

So there is one thing which does concern me and I think we can solve it in a nice way.

The problem is with threads. The issue is that the instance of the local nested thing can get created in thread A, but destructed in thread B... and then we are in no-mans land!

However, there is a clean way to handle this. We can define that instantiating the local nested thing in thread A will always tear down the nested context in thread A - no matter which thread is running the destructor.

Does that make sense to everyone? I am happy to code this up (either directly on this branch or on an extra branch based on this one).

I think we should include this in this PR to make it thread safe.

@martinmodrak
Copy link
Contributor Author

martinmodrak commented Feb 14, 2020

@wds15 I lack deep knowledge of threads, so I'd defer to you on this. But I am honestly intrigued - how do you imagine this being created and destroyed on different threads - can a scope begin on a thread and end on another thread? Note that I purposefully deleted the copy constructor, assignment operator and the new operator to avoid weirdness, but I certainly won't be completely shocked if that is not enough.

@bob-carpenter

I'm not sure how unassigned classes are handled for scoping---they might be destroyed right away since it's not going to be used again. Doesn't the latter give you unused variable warnings?

In all examples of using RAII, I've seen people declare the vars, so I would stick to this even if it is not strictly necessary (e.g. https://en.cppreference.com/w/cpp/language/raii). Though I think temporaries go out of scope sooner (https://rules.sonarsource.com/cpp/RSPEC-5184). Also I like having nested.set_zero_all_adjoints(); but that's a minor point.

Doesn't the latter give you unused variable warnings?

It does not - the compiler detects the constructor has side effects and realizes you might want to have the variable unused to handle RAII.

Regarding the name - it is easy to change it. I slightly don't like having begin in the name - once again most RAII code I've seen doesn't do this and the classes are just called lock, not begin_lock - also there is no end_nested_autodiff to match which might be confusing. I like having the explicit rev, so I would currently vote for nested_rev_autodiff or local_nested_rev_autodiff (to emphasize it is scope dependent). No strong opinions on this and will happily comply with your view on this.

@wds15
Copy link
Contributor

wds15 commented Feb 14, 2020

Example:

// in global scope:

local_nested* some_nested_ptr = 0;

// somewhere in the program start thread A
// thread A does this in its own running context:
some_nested_ptr = new local_nested();

// another thread B then does
delete some_nested_ptr;

And there you have the issue.

I am happy to code up something up which will avoid this. Essentially you have to store upon construction of the object a pointer to the actual AD tape which we want to refer to - and store it in the instance. That would be my suggestion.

@martinmodrak
Copy link
Contributor Author

martinmodrak commented Feb 14, 2020

@wds15 The code you proposed will not compile, because I marked the new operator as deleted - precisely to avoid such issues (I just verified this is actually the case).

@stan-buildbot
Copy link
Contributor


Name Old Result New Result Ratio Performance change( 1 - new / old )
gp_pois_regr/gp_pois_regr.stan 4.97 4.9 1.01 1.43% faster
low_dim_corr_gauss/low_dim_corr_gauss.stan 0.02 0.02 0.98 -2.1% slower
eight_schools/eight_schools.stan 0.09 0.09 0.99 -1.41% slower
gp_regr/gp_regr.stan 0.22 0.22 1.0 -0.32% slower
irt_2pl/irt_2pl.stan 6.09 6.06 1.01 0.51% faster
performance.compilation 88.15 87.36 1.01 0.9% faster
low_dim_gauss_mix_collapse/low_dim_gauss_mix_collapse.stan 7.48 7.31 1.02 2.29% faster
pkpd/one_comp_mm_elim_abs.stan 21.43 20.86 1.03 2.66% faster
sir/sir.stan 91.99 92.76 0.99 -0.84% slower
gp_regr/gen_gp_data.stan 0.05 0.05 1.0 -0.3% slower
low_dim_gauss_mix/low_dim_gauss_mix.stan 2.99 3.01 1.0 -0.41% slower
pkpd/sim_one_comp_mm_elim_abs.stan 0.33 0.32 1.05 4.8% faster
arK/arK.stan 1.76 2.42 0.73 -37.55% slower
arma/arma.stan 0.8 0.79 1.01 1.42% faster
garch/garch.stan 0.58 0.6 0.97 -2.85% slower
Mean result: 0.98603625392

Jenkins Console Log
Blue Ocean
Commit hash: 7563948


Machine information ProductName: Mac OS X ProductVersion: 10.11.6 BuildVersion: 15G22010

CPU:
Intel(R) Xeon(R) CPU E5-1680 v2 @ 3.00GHz

G++:
Configured with: --prefix=/Applications/Xcode.app/Contents/Developer/usr --with-gxx-include-dir=/usr/include/c++/4.2.1
Apple LLVM version 7.0.2 (clang-700.1.81)
Target: x86_64-apple-darwin15.6.0
Thread model: posix

Clang:
Apple LLVM version 7.0.2 (clang-700.1.81)
Target: x86_64-apple-darwin15.6.0
Thread model: posix

@bob-carpenter
Copy link
Contributor

bob-carpenter commented Feb 14, 2020 via email

@bob-carpenter
Copy link
Contributor

bob-carpenter commented Feb 14, 2020 via email

@martinmodrak
Copy link
Contributor Author

Name changed to nested_rev_autodiff.

Will the following compile?

I guess that it will, but I agree that we shouldn't worry about this - we want to prevent inadvertent errors. Anyone wanting to do more complex nesting logic would use the global functions directly anyway.

@bob-carpenter
Copy link
Contributor

bob-carpenter commented Feb 14, 2020 via email

@mcol mcol linked an issue Feb 17, 2020 that may be closed by this pull request
@stan-buildbot
Copy link
Contributor


Name Old Result New Result Ratio Performance change( 1 - new / old )
gp_pois_regr/gp_pois_regr.stan 4.98 4.95 1.01 0.68% faster
low_dim_corr_gauss/low_dim_corr_gauss.stan 0.02 0.02 0.97 -3.49% slower
eight_schools/eight_schools.stan 0.09 0.09 1.0 0.01% faster
gp_regr/gp_regr.stan 0.22 0.22 1.0 0.38% faster
irt_2pl/irt_2pl.stan 6.12 6.19 0.99 -1.27% slower
performance.compilation 88.33 87.27 1.01 1.2% faster
low_dim_gauss_mix_collapse/low_dim_gauss_mix_collapse.stan 7.5 7.45 1.01 0.67% faster
pkpd/one_comp_mm_elim_abs.stan 20.83 21.11 0.99 -1.35% slower
sir/sir.stan 90.91 92.72 0.98 -2.0% slower
gp_regr/gen_gp_data.stan 0.05 0.05 1.03 3.13% faster
low_dim_gauss_mix/low_dim_gauss_mix.stan 3.0 3.03 0.99 -1.13% slower
pkpd/sim_one_comp_mm_elim_abs.stan 0.33 0.31 1.06 5.85% faster
arK/arK.stan 1.74 2.35 0.74 -34.57% slower
arma/arma.stan 0.79 0.79 1.01 0.65% faster
garch/garch.stan 0.58 0.6 0.97 -2.79% slower
Mean result: 0.983757018583

Jenkins Console Log
Blue Ocean
Commit hash: 54e8567


Machine information ProductName: Mac OS X ProductVersion: 10.11.6 BuildVersion: 15G22010

CPU:
Intel(R) Xeon(R) CPU E5-1680 v2 @ 3.00GHz

G++:
Configured with: --prefix=/Applications/Xcode.app/Contents/Developer/usr --with-gxx-include-dir=/usr/include/c++/4.2.1
Apple LLVM version 7.0.2 (clang-700.1.81)
Target: x86_64-apple-darwin15.6.0
Thread model: posix

Clang:
Apple LLVM version 7.0.2 (clang-700.1.81)
Target: x86_64-apple-darwin15.6.0
Thread model: posix

@stan-buildbot
Copy link
Contributor


Name Old Result New Result Ratio Performance change( 1 - new / old )
gp_pois_regr/gp_pois_regr.stan 4.83 4.87 0.99 -0.69% slower
low_dim_corr_gauss/low_dim_corr_gauss.stan 0.02 0.02 0.98 -1.86% slower
eight_schools/eight_schools.stan 0.09 0.09 0.97 -2.64% slower
gp_regr/gp_regr.stan 0.22 0.22 1.0 -0.21% slower
irt_2pl/irt_2pl.stan 6.08 6.08 1.0 0.12% faster
performance.compilation 89.38 87.53 1.02 2.07% faster
low_dim_gauss_mix_collapse/low_dim_gauss_mix_collapse.stan 7.3 7.46 0.98 -2.08% slower
pkpd/one_comp_mm_elim_abs.stan 20.7 20.33 1.02 1.79% faster
sir/sir.stan 89.09 90.83 0.98 -1.95% slower
gp_regr/gen_gp_data.stan 0.05 0.05 0.98 -1.85% slower
low_dim_gauss_mix/low_dim_gauss_mix.stan 2.99 2.96 1.01 0.95% faster
pkpd/sim_one_comp_mm_elim_abs.stan 0.31 0.3 1.03 2.95% faster
arK/arK.stan 1.73 1.75 0.99 -1.08% slower
arma/arma.stan 0.77 0.79 0.97 -3.0% slower
garch/garch.stan 0.53 0.53 1.01 0.55% faster
Mean result: 0.995714302431

Jenkins Console Log
Blue Ocean
Commit hash: f3f01a3


Machine information ProductName: Mac OS X ProductVersion: 10.11.6 BuildVersion: 15G22010

CPU:
Intel(R) Xeon(R) CPU E5-1680 v2 @ 3.00GHz

G++:
Configured with: --prefix=/Applications/Xcode.app/Contents/Developer/usr --with-gxx-include-dir=/usr/include/c++/4.2.1
Apple LLVM version 7.0.2 (clang-700.1.81)
Target: x86_64-apple-darwin15.6.0
Thread model: posix

Clang:
Apple LLVM version 7.0.2 (clang-700.1.81)
Target: x86_64-apple-darwin15.6.0
Thread model: posix

@martinmodrak
Copy link
Contributor Author

@wds15 just pinging this, I'd be glad to get this merged, it is starting to see conflicts with develop. But if there is anything you have still doubts about, I'll be happy to work through it.

Copy link
Contributor

@bob-carpenter bob-carpenter left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This all looks great. Only question about why one of the function calls changed. I couldn't follow the motivation from looking at the code.

stan/math/rev/functor/idas_forward_system.hpp Show resolved Hide resolved
@wds15
Copy link
Contributor

wds15 commented Feb 21, 2020

@bob-carpenter can u please finish the review? Thanks. If you want me to look at something specific, I can do that.

@bob-carpenter
Copy link
Contributor

I reviewed all the scopes and the class and submitted the review. The "change request" was just for an answer to my question of why one of the function's arguments changed when this was just about scoping memory.

Was there an answer somewhere?

@rok-cesnovar
Copy link
Member

The answer is here #1706 (comment)

@stan-buildbot
Copy link
Contributor


Name Old Result New Result Ratio Performance change( 1 - new / old )
gp_pois_regr/gp_pois_regr.stan 4.78 4.87 0.98 -1.8% slower
low_dim_corr_gauss/low_dim_corr_gauss.stan 0.02 0.02 0.99 -0.62% slower
eight_schools/eight_schools.stan 0.09 0.09 1.02 1.81% faster
gp_regr/gp_regr.stan 0.22 0.22 1.0 0.09% faster
irt_2pl/irt_2pl.stan 6.05 6.11 0.99 -0.98% slower
performance.compilation 88.82 86.58 1.03 2.52% faster
low_dim_gauss_mix_collapse/low_dim_gauss_mix_collapse.stan 7.3 7.29 1.0 0.24% faster
pkpd/one_comp_mm_elim_abs.stan 19.85 21.59 0.92 -8.77% slower
sir/sir.stan 90.48 91.52 0.99 -1.14% slower
gp_regr/gen_gp_data.stan 0.04 0.05 0.99 -1.15% slower
low_dim_gauss_mix/low_dim_gauss_mix.stan 2.97 2.94 1.01 0.93% faster
pkpd/sim_one_comp_mm_elim_abs.stan 0.33 0.32 1.04 4.24% faster
arK/arK.stan 1.75 2.31 0.76 -32.05% slower
arma/arma.stan 0.75 0.79 0.94 -6.36% slower
garch/garch.stan 0.59 0.6 0.99 -1.14% slower
Mean result: 0.976714417368

Jenkins Console Log
Blue Ocean
Commit hash: f3f01a3


Machine information ProductName: Mac OS X ProductVersion: 10.11.6 BuildVersion: 15G22010

CPU:
Intel(R) Xeon(R) CPU E5-1680 v2 @ 3.00GHz

G++:
Configured with: --prefix=/Applications/Xcode.app/Contents/Developer/usr --with-gxx-include-dir=/usr/include/c++/4.2.1
Apple LLVM version 7.0.2 (clang-700.1.81)
Target: x86_64-apple-darwin15.6.0
Thread model: posix

Clang:
Apple LLVM version 7.0.2 (clang-700.1.81)
Target: x86_64-apple-darwin15.6.0
Thread model: posix

@stan-buildbot
Copy link
Contributor


Name Old Result New Result Ratio Performance change( 1 - new / old )
gp_pois_regr/gp_pois_regr.stan 4.92 4.88 1.01 0.86% faster
low_dim_corr_gauss/low_dim_corr_gauss.stan 0.02 0.02 1.03 2.99% faster
eight_schools/eight_schools.stan 0.09 0.09 0.98 -2.1% slower
gp_regr/gp_regr.stan 0.22 0.22 1.0 0.41% faster
irt_2pl/irt_2pl.stan 6.09 6.07 1.0 0.3% faster
performance.compilation 87.68 86.87 1.01 0.92% faster
low_dim_gauss_mix_collapse/low_dim_gauss_mix_collapse.stan 7.67 7.29 1.05 4.96% faster
pkpd/one_comp_mm_elim_abs.stan 20.76 20.09 1.03 3.23% faster
sir/sir.stan 89.09 90.71 0.98 -1.81% slower
gp_regr/gen_gp_data.stan 0.04 0.04 1.0 0.23% faster
low_dim_gauss_mix/low_dim_gauss_mix.stan 2.99 2.94 1.02 1.65% faster
pkpd/sim_one_comp_mm_elim_abs.stan 0.32 0.32 0.98 -1.6% slower
arK/arK.stan 1.75 2.31 0.76 -31.88% slower
arma/arma.stan 0.75 0.79 0.94 -5.93% slower
garch/garch.stan 0.59 0.59 1.0 0.22% faster
Mean result: 0.987402800746

Jenkins Console Log
Blue Ocean
Commit hash: f3f01a3


Machine information ProductName: Mac OS X ProductVersion: 10.11.6 BuildVersion: 15G22010

CPU:
Intel(R) Xeon(R) CPU E5-1680 v2 @ 3.00GHz

G++:
Configured with: --prefix=/Applications/Xcode.app/Contents/Developer/usr --with-gxx-include-dir=/usr/include/c++/4.2.1
Apple LLVM version 7.0.2 (clang-700.1.81)
Target: x86_64-apple-darwin15.6.0
Thread model: posix

Clang:
Apple LLVM version 7.0.2 (clang-700.1.81)
Target: x86_64-apple-darwin15.6.0
Thread model: posix

@stan-buildbot
Copy link
Contributor


Name Old Result New Result Ratio Performance change( 1 - new / old )
gp_pois_regr/gp_pois_regr.stan 4.83 4.77 1.01 1.26% faster
low_dim_corr_gauss/low_dim_corr_gauss.stan 0.02 0.02 0.97 -2.85% slower
eight_schools/eight_schools.stan 0.09 0.09 0.98 -2.28% slower
gp_regr/gp_regr.stan 0.21 0.21 1.0 0.03% faster
irt_2pl/irt_2pl.stan 6.05 6.06 1.0 -0.2% slower
performance.compilation 87.47 86.73 1.01 0.84% faster
low_dim_gauss_mix_collapse/low_dim_gauss_mix_collapse.stan 7.33 7.48 0.98 -1.94% slower
pkpd/one_comp_mm_elim_abs.stan 20.7 19.98 1.04 3.49% faster
sir/sir.stan 88.88 93.38 0.95 -5.06% slower
gp_regr/gen_gp_data.stan 0.04 0.05 0.99 -0.9% slower
low_dim_gauss_mix/low_dim_gauss_mix.stan 2.95 2.94 1.0 0.11% faster
pkpd/sim_one_comp_mm_elim_abs.stan 0.32 0.32 1.02 1.59% faster
arK/arK.stan 1.73 2.31 0.75 -33.91% slower
arma/arma.stan 0.76 0.78 0.98 -2.39% slower
garch/garch.stan 0.63 0.52 1.22 17.76% faster
Mean result: 0.992415781791

Jenkins Console Log
Blue Ocean
Commit hash: f3f01a3


Machine information ProductName: Mac OS X ProductVersion: 10.11.6 BuildVersion: 15G22010

CPU:
Intel(R) Xeon(R) CPU E5-1680 v2 @ 3.00GHz

G++:
Configured with: --prefix=/Applications/Xcode.app/Contents/Developer/usr --with-gxx-include-dir=/usr/include/c++/4.2.1
Apple LLVM version 7.0.2 (clang-700.1.81)
Target: x86_64-apple-darwin15.6.0
Thread model: posix

Clang:
Apple LLVM version 7.0.2 (clang-700.1.81)
Target: x86_64-apple-darwin15.6.0
Thread model: posix

@bob-carpenter bob-carpenter merged commit a6e36b2 into stan-dev:develop Feb 25, 2020
@wds15
Copy link
Contributor

wds15 commented Feb 25, 2020

Thanks @bob-carpenter

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Use RAII for nested autodiff
5 participants