Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
hmm_marginal_lpdf for discrete latent states #1778
hmm_marginal_lpdf for discrete latent states #1778
Changes from 3 commits
829c641
cc777f4
c08c103
f157f1d
eeaa18c
519eb32
6e5abcd
f9f5e82
e895f9a
5d9767b
a85bfa0
e3e7ae8
b9a12e7
66bf591
9e2722b
b055197
04a02f8
2a54f2b
2ddcb6f
06527a5
14d490c
89b1339
abcb7df
54be7f1
a6e0f02
6c90efb
9c236aa
b02a0f5
bb46c07
ec7412e
61de1e4
aad9377
feea473
01c2ab1
c141eaa
6775bef
2465f86
a2775cc
581c2ee
fe0b3bc
7cdf8bb
0d7d01b
4d7d9b2
d30f856
aeb3e12
5e04b48
d257eca
cc76a31
215638f
d69fe1b
d490911
8767d1b
64b46a7
d6b2b36
543b68b
56c7c49
950458c
863f0ef
45f1bf9
fce60f3
b1a0e60
d94988e
f03ba32
82c7ee7
1f0aeb7
37da7af
be8b507
ede1e0b
59cf78e
1b7dfa4
f457562
4b965dc
79ff5e7
4147b89
be9e90b
5a0786b
bfd1d7a
d12113b
e8bc649
4532594
0adeadf
3ee0a62
cea2cae
7861bd7
File filter
Filter by extension
Conversations
Jump to
There are no files selected for viewing
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Arrays in Eigen perform operations element by element so the matrix has to be "cast" to an array before doing elementwise exp()
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Generally make things here const that are not changed
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
hmmm... overkilled, no?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No it's good to let the compiler know things are const
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It also let's readers know "The value of this object will not change through the rest of it's life"
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
hmmm.... I'm going to need a high charisma roll to be convinced.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
My dude you can chase your bliss and if that doesn't involve const then sure that's fine. But if you don't mutate something later why not let the reader know it won't change by declaring it const?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Mainly because it clutters the code. The compiler can't act just on
const
because it's possible to cast away theconst
modifier. I try to follow the lead of mature packages like Boost, which don't declare all their local variables (or arithmetic function args)const
.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok, that's settled then.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This should not have "marginal" in its name because we think of the latent states as parameters and thus they are naturally marginalized out. In the traditional literature, they distinguished between the "data density" (for y) and the "complete data distribution" (for y and the latent states z), because strict frequentists have to pretend the mixture responsibility parameters are data so that they don't have a philosophical meltdown. You see that usage all over the EM literature, where the latent states z get marginalized out.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'll ask for a second opinion on this (@vianeylb, @betanalpha). As someone familiar with a Bayesian framework and a candid approach to the problem, this is clearly a marginal distribution. Naturally, I'm willing to accomodate the user.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In the Cappé et al book, the distribution of the observations is referred to as the "marginal distribution of the observations only", obtained through the process of marginalization of the state variables.
In the Zucchini et al book, they have a section on marginal distributions of a single observation, Y_t, and discuss higher order marginal distributions (e.g. [Y_t, Y_t+k]). Extending this, the distribution of the data only also falls under what they refer to as marginal distributions.
The likelihood in both contexts comes from what they refer to as the marginal distribution of the data only, and they emphasize that this quantity is obtained through marginalization over the state variables. Given that inference for HMMs via the likelihood (marginal distribution of the data) vs the complete-data likelihood seems to be a topic that people are highly opinionated about, emphasizing that inference for the parameters of an HMM in Stan is done through marginalization over the state variables and use of the marginal distribution of the data directly seems preferable to me. And as the Zucchini book is clearly more on the frequentist side, I don't think the "marginal_lpdf" will be divisive to use. Just my opinion.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree with Vianey!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We all agree that what's at play is a joint Bayesian model
p(y, z, theta)
wherey
is observed data,z
is unobserved, andtheta
are parameters.The terminology I'm familiar with calls
p(y | theta)
the "likelihood" andp(y, z | theta)
the "full data likelihood", because frequentist philosophy melts down ifz
is considered a parameter.I looked up Zucchini et al. and in section 2.3, they use likelihood in exactly this way for
p(y | theta)
with the statesz
marginalized out ofp(y, z | theta)
. They then talk about marginalizingp(y | theta)
to subsequences ofy
. In general, if we have a joint distributionp(y)
, we don't talk aboutp(y)
as a marginal distribution even though it is the degenerate case with nothing marginalized out. That's why I find callingp(y | theta)
a marginal distribution confusing in this context.I'd also strongly prefer not to have verbose names for things if we're not going to be distinguishing among many alternatives.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The idiom here is the following:
Mainly because it'll work if the type is an unsized type, but also becaue it reduces the number of operations.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't follow. The given code doesn't work (and nor do small tweaks around it). Can you be more specific about the wanted change?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
[optional]
I'd prefer something at the very top handling the
n_transitions == 0
case so it's not in the way of understanding all the code. (Overall the code's really understandable.)There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree. I'll work on it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
+1, also Charles note that the if statements that use type traits like
is_constant_all
are resolved at compile time (i.e. it just becomesif (true/false)
) so the alt code path will just be deleted since it's dead so you don't need to worry about those ifs in terms of performanceThere was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think I found a fairly good solution. The
n_transitions == 0
case only matters when handling the boundary terms, so it's now handled at the very top, when we start treating the boundary terms.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If
n_transitions == 0
isn't just split out at the top, push branching as far down as possible so that the parallel assignment structure is clear.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Handling the
n_transitions == 0
at the top now.