Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

hmm_marginal_lpdf for discrete latent states #1778

Merged
merged 84 commits into from
May 4, 2020

Conversation

charlesm93
Copy link
Contributor

Summary

For a Hidden Markov Model with observation y, hidden state x, and parameters theta, return the log marginal density, log pi(y | theta). In this setting, the hidden states are discrete and take values over the finite space {1, ..., K}. The marginal lpdf is obtained via a forward pass, and the gradients using an adjoint method. For a discussion on the latter, and several ways of deriving the adjoint method, see https://arxiv.org/abs/2002.00326.

While the user can code in their own marginal likelihood for a hidden Markov model, the process is coding intensive, requires careful bookkeeping, and the differentiation is less efficient. For the latter, the proposed function removes computational and memory overhead, and a high constant.

The user specifies a matrix of log likelihood density, computed on observations y. The number of rows corresponds to the number of states (so K), and the number of columns the number of transitions + 1 (to account for the zeroth observation, before any transition). The user also passes the transition matrix Gamma, with K rows and K columns, and the initial rho vector of length K. We can turn the Gamma matrix into an array of matrices, in a later version. The rows of Gamma and rho are simplex.

Tests

A standard test for a system with two states and 10 transitions, and two edge case tests for 0 transitions and 1 state. An additional test for the error messages. See test/unit/prim/prob/hmm_marginal_test.cpp.

Side Effects

Nope.

Checklist

  • Math issue lpdf for Hidden Markov models with discrete latent states #1648

  • Copyright holder: Charles Margossian, Columbia University and Michael Betancourt, Symplectomorphic, LLC

    The copyright holder is typically you or your assignee, such as a university or company. By submitting this pull request, the copyright holder is agreeing to the license the submitted work under the following licenses:
    - Code: BSD 3-clause (https://opensource.org/licenses/BSD-3-Clause)
    - Documentation: CC-BY 4.0 (https://creativecommons.org/licenses/by/4.0/)

  • the basic tests are passing

    • unit tests pass (to run, use: ./runTests.py test/unit/prim/prob/hmm_marginal_test.cpp)
    • header checks pass, (make test-headers)
    • dependencies checks pass, (make test-math-dependencies)
    • docs build, (make doxygen)
    • code passes the built in C++ standards checks (make cpplint)
  • the code is written in idiomatic C++ and changes are documented in the doxygen

  • the new changes are tested

bob-carpenter
bob-carpenter previously approved these changes Mar 13, 2020
Copy link
Contributor

@bob-carpenter bob-carpenter left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Overall, this looks fantastic. I think it's super important, so I'm asking for a lot of changes here. I might ask for a few more performance things once I see it all again. It'd be much easier to understand if the test for n_transitions > 0 didn't have to be made everywhere, either by cleaning up boundary of algorithms or pulling out the size 0 and size 1 inputs and handling them separately at the beginning.

Also, I'd like to see this work for forward mode. I don't see why it can't given that it's using operands and partials. What happens with the tests if they're not set to infinity?

stan/math/prim/prob/hmm_marginal_lpdf.hpp Outdated Show resolved Hide resolved
stan/math/prim/prob/hmm_marginal_lpdf.hpp Outdated Show resolved Hide resolved
stan/math/prim/prob/hmm_marginal_lpdf.hpp Outdated Show resolved Hide resolved
stan/math/prim/prob/hmm_marginal_lpdf.hpp Outdated Show resolved Hide resolved
stan/math/prim/prob/hmm_marginal_lpdf.hpp Outdated Show resolved Hide resolved
stan/math/prim/prob/hmm_marginal_lpdf.hpp Outdated Show resolved Hide resolved
test/unit/math/prim/prob/hmm_marginal_test.cpp Outdated Show resolved Hide resolved
test/unit/math/prim/prob/hmm_marginal_test.cpp Outdated Show resolved Hide resolved
test/unit/math/prim/prob/hmm_marginal_test.cpp Outdated Show resolved Hide resolved
test/unit/math/prim/prob/hmm_marginal_test.cpp Outdated Show resolved Hide resolved
@bob-carpenter
Copy link
Contributor

Ack. That should not have been an approval. I don't know how to fix it now without just dismissing that last review. Anyway, I want to review again after changes.

@bob-carpenter bob-carpenter self-requested a review March 13, 2020 20:38
@bob-carpenter
Copy link
Contributor

I added a re-request to make sure this doesn't get merged without fixes. Sorry about that.

@syclik syclik changed the title hmm_marginal_lpdf for discrete latent sates hmm_marginal_lpdf for discrete latent states Apr 28, 2020
@bob-carpenter
Copy link
Contributor

I don't see any outstanding questions. I'm OK with the answers you've provided so far, if that's what you were waiting for.

Is this just ready for a final review now?

@charlesm93
Copy link
Contributor Author

@bob-carpenter the outstanding questions are in the unresolved conversations and responses to your review comments, and concern:

  • doc for @throw std:errror
  • whether I correctly use check_matching_size
  • why is there a benefit in doing transpose outside the loop if we're using lazy eval
  • where to define norm_norm
  • In the unit test, is it ok to use M_PI for the value of pi

@bob-carpenter
Copy link
Contributor

Is there a longer form of these? I'm not sure what the questions are and can't seem to find any matching text in the discussion. The search for std: on this page yielded only the last question. Let me try to guess to make some progress:

I'd combine all the errors of the same throw type into a single doc string. Otherwise, you just want to doc briefly what it does in the actual @throw line and put the details in the actual function doc.

  • whether I correctly use check_matching_size

Yes, check matching sizes is OK, but now that I see what you're doing, what you probably want there is check_multiplicable on Gamma and log_omega.

  • why is there a benefit in doing transpose outside the loop if we're using lazy eval

A single transpose on the outside does the transposition efficiently once, then everything's memory local after that. Using the expression template makes you evaluate it each time. I believe that in some cases, Eigen itself will transpose A proactively before multiplying A * B.

  • where to define norm_norm

I'm not sure what the question is. In general, you want to define local variables as late as possible, meaning as close to where they are used as possible. Having said that, you want them at a high enough scope that duplicate work isn't done. So no expensive repetetive operations in loops.

  • In the unit test, is it ok to use M_PI for the value of pi

Absolutely. You can use whatever is in the library for testing. Because it's going to be tested independently. There's also the function stan::math::pi().

@charlesm93
Copy link
Contributor Author

Is there a longer form of these?

Yes. In your review, you comment the code and I then respond to your comments. Maybe it's not appearing when your search the page, because some of the conversations are hidden. You can however click on show hidden conversations.

In any case, thanks for the initial responses, this already helps.

If you can't find the original questions, I'll clarify what I'm asking.

@bob-carpenter
Copy link
Contributor

This is the worst interface. I scrolled through the entire code file and opened everything closed and can't find your comments. If you could link one, it might help me find the others.

@charlesm93
Copy link
Contributor Author

Here's one of the comments: #1778 (comment)

@bob-carpenter
Copy link
Contributor

I think I went through all the hidden stuff and answered outstanding questions. If not, please let me know. I'm really struggling with this "hidden / resolved / outdated" distinction and also what it does and doesn't report in the code view.

@charlesm93
Copy link
Contributor Author

I think I went through all the hidden stuff and answered outstanding questions

Indeed you did. The code is ready for its final review.

Copy link
Contributor

@bob-carpenter bob-carpenter left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks great. Just one really trivial change I didn't catch before, then this is good to go. Thanks much for bearing with the long review process.


check_consistent_size("hmm_marginal_lpdf", "rho", rho, n_states);
check_simplex("hmm_marginal_lpdf", "rho", rho);
if (n_transitions != 0) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I missed this conditional in the first review. Semantically, you don't want special conditions to bypass the tests.

It's also likely to slow the code down (not noticeably given everything else going on!) because the condition's almost never going to be satisfied in any realistic data set.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The reason for having the conditional is to allow the user to not construct Gamma if there are 0 transitions (as in it suffices to declare Gamma and make it 0 x 0). So I'd prefer keeping the conditional. But, no strong preference, it's your call.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Presumably we don't need size 0 inputs at all. For efficiency, those can just be removed from the data.

How often do size 1 inputs come up? I suspect that it'd be where the size 1 input is just one of many inputs, so Gamma would already exist. This would save the check for simplex condition, which involves some real arithmetic.

In our other distributions, we've put the check for size 0 after all the checks. For example, in normal(y | mu, sigma), there's a check that if any of the arguments is a container (vector, row vector or array), then all of the container arguments have to be the same size. If any of the inputs is size 0, we can return 0, but we do the size equality check and the check that sigma > 0 first.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I removed the conditional.

Now, a short comment.

Presumably we don't need size 0 inputs at all. For efficiency, those can just be removed from the data.

The function always requires you to pass Gamma but if you have 0 transitions, you wouldn't need Gamma. I get that it's an unlikely occurrence but it's an edge case that we wanted to test and it's been its own Pandora box. Not sure about size 1, but even then, you want the element in Gamma to be 1.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks.

I understand that it's not free to check that Gamma's a proper stochastic matrix and that it's not going to be used in the likelihood if there aren't any transitions. Similarly, the outputs and initial distribution won't get used if the input is size 0. Nevertheless, I'd like to keep the arguments legal everywhere just because the interface is simpler to doc and explain that way. If the legal argument checks didn't apply for size 0 and/or 1 inputs, all the doc would need to reflect that, so every @throwand description of matrix would have to change. So it'd be something like @Gamma a stochastic matrix that's multiplicable by ... with @param Gamma a stochastic matrix that's multiplicable by ..., or if the size of the argument y is 0 or 1, it can be any size or shape matrix and so on.

@stan-buildbot
Copy link
Contributor


Name Old Result New Result Ratio Performance change( 1 - new / old )
gp_pois_regr/gp_pois_regr.stan 5.02 5.1 0.98 -1.63% slower
low_dim_corr_gauss/low_dim_corr_gauss.stan 0.02 0.02 1.0 -0.42% slower
eight_schools/eight_schools.stan 0.09 0.09 0.99 -0.56% slower
gp_regr/gp_regr.stan 0.23 0.23 0.97 -3.42% slower
irt_2pl/irt_2pl.stan 6.45 6.43 1.0 0.2% faster
performance.compilation 85.97 84.47 1.02 1.75% faster
low_dim_gauss_mix_collapse/low_dim_gauss_mix_collapse.stan 7.59 8.01 0.95 -5.61% slower
pkpd/one_comp_mm_elim_abs.stan 20.0 20.98 0.95 -4.92% slower
sir/sir.stan 95.23 91.09 1.05 4.35% faster
gp_regr/gen_gp_data.stan 0.05 0.05 1.0 -0.11% slower
low_dim_gauss_mix/low_dim_gauss_mix.stan 3.02 2.98 1.01 1.4% faster
pkpd/sim_one_comp_mm_elim_abs.stan 0.31 0.32 0.98 -2.01% slower
arK/arK.stan 1.75 1.76 1.0 -0.46% slower
arma/arma.stan 0.77 0.76 1.0 0.34% faster
garch/garch.stan 0.57 0.57 1.0 -0.25% slower
Mean result: 0.993088097098

Jenkins Console Log
Blue Ocean
Commit hash: 3ee0a62


Machine information ProductName: Mac OS X ProductVersion: 10.11.6 BuildVersion: 15G22010

CPU:
Intel(R) Xeon(R) CPU E5-1680 v2 @ 3.00GHz

G++:
Configured with: --prefix=/Applications/Xcode.app/Contents/Developer/usr --with-gxx-include-dir=/usr/include/c++/4.2.1
Apple LLVM version 7.0.2 (clang-700.1.81)
Target: x86_64-apple-darwin15.6.0
Thread model: posix

Clang:
Apple LLVM version 7.0.2 (clang-700.1.81)
Target: x86_64-apple-darwin15.6.0
Thread model: posix

@stan-buildbot
Copy link
Contributor


Name Old Result New Result Ratio Performance change( 1 - new / old )
gp_pois_regr/gp_pois_regr.stan 5.07 5.13 0.99 -1.24% slower
low_dim_corr_gauss/low_dim_corr_gauss.stan 0.02 0.02 0.96 -4.55% slower
eight_schools/eight_schools.stan 0.09 0.09 1.0 0.24% faster
gp_regr/gp_regr.stan 0.23 0.22 1.0 0.35% faster
irt_2pl/irt_2pl.stan 6.49 6.45 1.01 0.59% faster
performance.compilation 85.59 84.69 1.01 1.05% faster
low_dim_gauss_mix_collapse/low_dim_gauss_mix_collapse.stan 7.58 7.54 1.0 0.46% faster
pkpd/one_comp_mm_elim_abs.stan 20.27 20.24 1.0 0.17% faster
sir/sir.stan 91.49 92.5 0.99 -1.11% slower
gp_regr/gen_gp_data.stan 0.05 0.05 1.02 1.54% faster
low_dim_gauss_mix/low_dim_gauss_mix.stan 3.05 3.03 1.01 0.85% faster
pkpd/sim_one_comp_mm_elim_abs.stan 0.31 0.31 0.99 -0.92% slower
arK/arK.stan 1.76 1.75 1.0 0.13% faster
arma/arma.stan 0.75 0.77 0.97 -3.36% slower
garch/garch.stan 0.57 0.57 1.01 0.58% faster
Mean result: 0.996802999408

Jenkins Console Log
Blue Ocean
Commit hash: 7861bd7


Machine information ProductName: Mac OS X ProductVersion: 10.11.6 BuildVersion: 15G22010

CPU:
Intel(R) Xeon(R) CPU E5-1680 v2 @ 3.00GHz

G++:
Configured with: --prefix=/Applications/Xcode.app/Contents/Developer/usr --with-gxx-include-dir=/usr/include/c++/4.2.1
Apple LLVM version 7.0.2 (clang-700.1.81)
Target: x86_64-apple-darwin15.6.0
Thread model: posix

Clang:
Apple LLVM version 7.0.2 (clang-700.1.81)
Target: x86_64-apple-darwin15.6.0
Thread model: posix

@SteveBronder
Copy link
Collaborator

@serban-nicusor-toptal I think something goofy may be going on with the upstream tests. I keep seeing stuffl like the below in PRs

https://jenkins.mc-stan.org/job/Stan/job/downstream_tests/1553/console

[LLVM/Clang] [-ERROR-] Can't determine head commit using 'git rev-parse'. Skipping blame.
[LLVM/Clang] [-ERROR-] hudson.plugins.git.GitException: Command "/usr/bin/git rev-parse HEAD^{commit}" returned status code 128:
stdout: 
stderr: fatal: Not a git repository (or any of the parent directories): .git
[LLVM/Clang] [-ERROR-] at org.jenkinsci.plugins.gitclient.CliGitAPIImpl.launchCommandIn(CliGitAPIImpl.java:2372)
[LLVM/Clang] [-ERROR-] at org.jenkinsci.plugins.gitclient.CliGitAPIImpl.launchCommandIn(CliGitAPIImpl.java:2302)
[LLVM/Clang] [-ERROR-] at org.jenkinsci.plugins.gitclient.CliGitAPIImpl.launchCommandIn(CliGitAPIImpl.java:2298)

Is this from trying not to execute the tests when there are no changes? It might be better to pull that out and run the tests always until we can get rid of false positive cases like the above

@serban-nicusor-toptal
Copy link
Contributor

Hey, I see it failing here

src/test/interface/mpi_test.cpp:3:39: fatal error: test/test-models/proper.hpp: No such file or directory

compilation terminated.

make/tests:25: recipe for target 'test/interface/mpi_test.o' failed

make: *** [test/interface/mpi_test.o] Error 1

make: *** Waiting for unfinished jobs....

@SteveBronder
Copy link
Collaborator

Oh! Hmm maybe this is a goof in the stan repo? I'll check this out

@serban-nicusor-toptal
Copy link
Contributor

Just saw a cmdstan develop build that failed with the same error as here and the next commit is building fine here. downstream_tests must have caught that broken state and used it, it should have fixed it self now. Let me know if the next build fails and I'll look further into it.

@stan-buildbot
Copy link
Contributor


Name Old Result New Result Ratio Performance change( 1 - new / old )
gp_pois_regr/gp_pois_regr.stan 5.12 5.1 1.0 0.47% faster
low_dim_corr_gauss/low_dim_corr_gauss.stan 0.02 0.02 1.01 1.47% faster
eight_schools/eight_schools.stan 0.09 0.09 1.02 2.41% faster
gp_regr/gp_regr.stan 0.22 0.23 0.99 -0.92% slower
irt_2pl/irt_2pl.stan 6.46 6.42 1.01 0.65% faster
performance.compilation 85.69 84.5 1.01 1.39% faster
low_dim_gauss_mix_collapse/low_dim_gauss_mix_collapse.stan 7.56 7.55 1.0 0.12% faster
pkpd/one_comp_mm_elim_abs.stan 20.07 20.07 1.0 0.01% faster
sir/sir.stan 93.46 92.87 1.01 0.63% faster
gp_regr/gen_gp_data.stan 0.05 0.05 0.99 -1.19% slower
low_dim_gauss_mix/low_dim_gauss_mix.stan 3.0 2.98 1.01 0.71% faster
pkpd/sim_one_comp_mm_elim_abs.stan 0.31 0.31 0.99 -0.67% slower
arK/arK.stan 1.74 1.76 0.99 -1.35% slower
arma/arma.stan 0.76 0.78 0.97 -3.42% slower
garch/garch.stan 0.57 0.56 1.01 1.32% faster
Mean result: 1.00126622662

Jenkins Console Log
Blue Ocean
Commit hash: 7861bd7


Machine information ProductName: Mac OS X ProductVersion: 10.11.6 BuildVersion: 15G22010

CPU:
Intel(R) Xeon(R) CPU E5-1680 v2 @ 3.00GHz

G++:
Configured with: --prefix=/Applications/Xcode.app/Contents/Developer/usr --with-gxx-include-dir=/usr/include/c++/4.2.1
Apple LLVM version 7.0.2 (clang-700.1.81)
Target: x86_64-apple-darwin15.6.0
Thread model: posix

Clang:
Apple LLVM version 7.0.2 (clang-700.1.81)
Target: x86_64-apple-darwin15.6.0
Thread model: posix

@charlesm93
Copy link
Contributor Author

@bob-carpenter Ready for merge?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

8 participants