-
-
Notifications
You must be signed in to change notification settings - Fork 183
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
hmm_marginal_lpdf for discrete latent states #1778
Conversation
…gs/RELEASE_500/final)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Overall, this looks fantastic. I think it's super important, so I'm asking for a lot of changes here. I might ask for a few more performance things once I see it all again. It'd be much easier to understand if the test for n_transitions > 0
didn't have to be made everywhere, either by cleaning up boundary of algorithms or pulling out the size 0 and size 1 inputs and handling them separately at the beginning.
Also, I'd like to see this work for forward mode. I don't see why it can't given that it's using operands and partials. What happens with the tests if they're not set to infinity?
Ack. That should not have been an approval. I don't know how to fix it now without just dismissing that last review. Anyway, I want to review again after changes. |
I added a re-request to make sure this doesn't get merged without fixes. Sorry about that. |
…4.1 (tags/RELEASE_600/final)
I don't see any outstanding questions. I'm OK with the answers you've provided so far, if that's what you were waiting for. Is this just ready for a final review now? |
@bob-carpenter the outstanding questions are in the unresolved conversations and responses to your review comments, and concern:
|
Is there a longer form of these? I'm not sure what the questions are and can't seem to find any matching text in the discussion. The search for
I'd combine all the errors of the same throw type into a single doc string. Otherwise, you just want to doc briefly what it does in the actual
Yes, check matching sizes is OK, but now that I see what you're doing, what you probably want there is
A single transpose on the outside does the transposition efficiently once, then everything's memory local after that. Using the expression template makes you evaluate it each time. I believe that in some cases, Eigen itself will transpose
I'm not sure what the question is. In general, you want to define local variables as late as possible, meaning as close to where they are used as possible. Having said that, you want them at a high enough scope that duplicate work isn't done. So no expensive repetetive operations in loops.
Absolutely. You can use whatever is in the library for testing. Because it's going to be tested independently. There's also the function |
Yes. In your review, you comment the code and I then respond to your comments. Maybe it's not appearing when your search the page, because some of the conversations are hidden. You can however click on show hidden conversations. In any case, thanks for the initial responses, this already helps. If you can't find the original questions, I'll clarify what I'm asking. |
This is the worst interface. I scrolled through the entire code file and opened everything closed and can't find your comments. If you could link one, it might help me find the others. |
Here's one of the comments: #1778 (comment) |
I think I went through all the hidden stuff and answered outstanding questions. If not, please let me know. I'm really struggling with this "hidden / resolved / outdated" distinction and also what it does and doesn't report in the code view. |
…dev/math into try-efficient_hmm_gradient
Indeed you did. The code is ready for its final review. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks great. Just one really trivial change I didn't catch before, then this is good to go. Thanks much for bearing with the long review process.
|
||
check_consistent_size("hmm_marginal_lpdf", "rho", rho, n_states); | ||
check_simplex("hmm_marginal_lpdf", "rho", rho); | ||
if (n_transitions != 0) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I missed this conditional in the first review. Semantically, you don't want special conditions to bypass the tests.
It's also likely to slow the code down (not noticeably given everything else going on!) because the condition's almost never going to be satisfied in any realistic data set.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The reason for having the conditional is to allow the user to not construct Gamma if there are 0 transitions (as in it suffices to declare Gamma and make it 0 x 0). So I'd prefer keeping the conditional. But, no strong preference, it's your call.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Presumably we don't need size 0 inputs at all. For efficiency, those can just be removed from the data.
How often do size 1 inputs come up? I suspect that it'd be where the size 1 input is just one of many inputs, so Gamma
would already exist. This would save the check for simplex condition, which involves some real arithmetic.
In our other distributions, we've put the check for size 0 after all the checks. For example, in normal(y | mu, sigma)
, there's a check that if any of the arguments is a container (vector, row vector or array), then all of the container arguments have to be the same size. If any of the inputs is size 0, we can return 0, but we do the size equality check and the check that sigma > 0
first.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I removed the conditional.
Now, a short comment.
Presumably we don't need size 0 inputs at all. For efficiency, those can just be removed from the data.
The function always requires you to pass Gamma but if you have 0 transitions, you wouldn't need Gamma. I get that it's an unlikely occurrence but it's an edge case that we wanted to test and it's been its own Pandora box. Not sure about size 1, but even then, you want the element in Gamma to be 1.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks.
I understand that it's not free to check that Gamma's a proper stochastic matrix and that it's not going to be used in the likelihood if there aren't any transitions. Similarly, the outputs and initial distribution won't get used if the input is size 0. Nevertheless, I'd like to keep the arguments legal everywhere just because the interface is simpler to doc and explain that way. If the legal argument checks didn't apply for size 0 and/or 1 inputs, all the doc would need to reflect that, so every @throw
and description of matrix would have to change. So it'd be something like @Gamma a stochastic matrix that's multiplicable by ...
with @param Gamma a stochastic matrix that's multiplicable by ..., or if the size of the argument y is 0 or 1, it can be any size or shape matrix
and so on.
Jenkins Console Log Machine informationProductName: Mac OS X ProductVersion: 10.11.6 BuildVersion: 15G22010CPU: G++: Clang: |
…dev/math into try-efficient_hmm_gradient
Jenkins Console Log Machine informationProductName: Mac OS X ProductVersion: 10.11.6 BuildVersion: 15G22010CPU: G++: Clang: |
@serban-nicusor-toptal I think something goofy may be going on with the upstream tests. I keep seeing stuffl like the below in PRs https://jenkins.mc-stan.org/job/Stan/job/downstream_tests/1553/console
Is this from trying not to execute the tests when there are no changes? It might be better to pull that out and run the tests always until we can get rid of false positive cases like the above |
Hey, I see it failing here
|
Oh! Hmm maybe this is a goof in the stan repo? I'll check this out |
Jenkins Console Log Machine informationProductName: Mac OS X ProductVersion: 10.11.6 BuildVersion: 15G22010CPU: G++: Clang: |
@bob-carpenter Ready for merge? |
Summary
For a Hidden Markov Model with observation y, hidden state x, and parameters theta, return the log marginal density, log pi(y | theta). In this setting, the hidden states are discrete and take values over the finite space {1, ..., K}. The marginal lpdf is obtained via a forward pass, and the gradients using an adjoint method. For a discussion on the latter, and several ways of deriving the adjoint method, see https://arxiv.org/abs/2002.00326.
While the user can code in their own marginal likelihood for a hidden Markov model, the process is coding intensive, requires careful bookkeeping, and the differentiation is less efficient. For the latter, the proposed function removes computational and memory overhead, and a high constant.
The user specifies a matrix of log likelihood density, computed on observations y. The number of rows corresponds to the number of states (so K), and the number of columns the number of transitions + 1 (to account for the zeroth observation, before any transition). The user also passes the transition matrix Gamma, with K rows and K columns, and the initial rho vector of length K. We can turn the Gamma matrix into an array of matrices, in a later version. The rows of Gamma and rho are simplex.
Tests
A standard test for a system with two states and 10 transitions, and two edge case tests for 0 transitions and 1 state. An additional test for the error messages. See
test/unit/prim/prob/hmm_marginal_test.cpp
.Side Effects
Nope.
Checklist
Math issue lpdf for Hidden Markov models with discrete latent states #1648
Copyright holder: Charles Margossian, Columbia University and Michael Betancourt, Symplectomorphic, LLC
The copyright holder is typically you or your assignee, such as a university or company. By submitting this pull request, the copyright holder is agreeing to the license the submitted work under the following licenses:
- Code: BSD 3-clause (https://opensource.org/licenses/BSD-3-Clause)
- Documentation: CC-BY 4.0 (https://creativecommons.org/licenses/by/4.0/)
the basic tests are passing
./runTests.py test/unit/prim/prob/hmm_marginal_test.cpp
)make test-headers
)make test-math-dependencies
)make doxygen
)make cpplint
)the code is written in idiomatic C++ and changes are documented in the doxygen
the new changes are tested