Unable to define Dirichlet prior #178

helmutsimon · 2018-05-22T06:25:39Z

Hello

I am trying to build a simple model to estimate parameters of a multinomial distribution using a Dirichlet prior. Using the inbuilt Dirichlet distribution as in the following code seems to work OK as follows (JUpyter notebook also attached).

import numpy as np
import pymc
import sys
from pymc import *


def simple_model(n, data, numseg, prior):
    probs = Dirichlet('probs', prior)
    process = Multinomial('process', numseg, probs, value=data, observed=True)
    return locals()


print('numpy version ', np.__version__)
print('pymc version ', pymc.__version__)
print(sys.version)
n = 4
data = np.array([5, 7, 1, 0])
numseg = np.sum(data)
prior = np.ones(n) / n
print('MLE = ', data / np.sum(data))
model = MCMC(simple_model(n, data, numseg, prior))
model.sample(iter=100000, burn=1000, thin=10)
means = model.stats()['probs']['mean']
print('\nmeans = ', means)
print('sds = ', model.stats()['probs']['standard deviation'])

However, I am trying to replace the inbuilt Dirichlet distribution by a user-defined one (which I plan to modify later) as per below, and the samples never leave thee starting value. Am I doing something wrong? Environment is:
numpy version 1.11.3
pymc version 2.3.6
3.5.4 |Continuum Analytics, Inc.| (default, Aug 14 2017, 12:43:10)
[GCC 4.2.1 Compatible Apple LLVM 6.0 (clang-600.0.57)]

import numpy as np
from pymc import *


def simple_model(n, data, numseg, prior):
    
    @stochastic(dtype='float64')
    def probs(value=prior, pms=prior):
        return stats.dirichlet.logpdf(value, pms)
    
    process = Multinomial('process', numseg, probs, value=data, observed=True)
    return locals()


n = 4
data = np.array( [5, 7, 1, 0])
numseg = np.sum(data)
prior = np.ones(n) / n
print('MLE = ', data / np.sum(data))
print('numpy version ', np.__version__)
model = MCMC(simple_model(n, data, numseg, prior))
model.sample(iter=1000, burn=100, thin=10)
means = model.stats()['probs']['mean']
print('\nmeans = ', means)
print('sds = ', model.stats()['probs']['standard deviation'])
[Pymc2_model_issues.txt](https://github.com/pymc-devs/pymc/files/2025215/Pymc2_model_issues.txt)

fonnesbeck · 2018-05-22T18:24:47Z

What sort of output are you seeing? Any warnings or error messages?

Also, is there a reason for using PyMC2? We recommend new projects use PyMC3 for building statistical models. You don’t get gradient-based samplers in PyMC2, which is a big deal.

helmutsimon · 2018-05-23T01:14:03Z

Hi Chris

Thanks for responding. I don't get any warnings other than the depreciation warning shown at the end of this comment. PyMC2 produces a trace, but all elements of the trace are the same as the starting value, i.e. [0.25, 0.25, 0.25, 0.25]. Since stats.dirichlet.logpdf(value, pms) will return an error unless sum(value)==1, I am guessing that proposals are being generated that do not satisfy this condition, hence are always rejected.

As for PyMC3, I have been working on this also, but come up against the same problem. That is, if I enforce the condition sum(value)==1 in my function to return logpdf, eg:

    def dirich_logpdf(value):
        if tt.gt(np.abs(tt.sum(value) - 1.), 1e-10) \
                or ~ tt.all(value > 0) or ~ tt.all(value < 1):
            return -np.inf
        else:
            v0 = value[0]
            v1 = value[1]
            v2 = value[2]
            v3 = value[3]
            return  -5.15209009879231 - 0.75 * (np.log(v0) + np.log(v1) + np.log(v2) + np.log(v3))

then all proposals appear to be rejected. Presumably the PyMC code has some way of generating compliant proposals for the Dirichlet distribution, but I don't know how this is done. If I did, I might be able to emulate it.

Following is the PyMC2 warning that I received:

/Users/helmutsimon/miniconda3/lib/python3.5/site-packages/numpy/core/fromnumeric.py:224: VisibleDeprecationWarning: using a non-integer number instead of an integer will result in an error in the future
  return reshape(newshape, order=order)

fonnesbeck · 2018-05-23T17:18:22Z

As far as the PyMC2 implementation is concerned, you will probably have more success passing in k-1 parameters and let PyMC do the simplex for you, rather than all k probabilities. Sampling tends to be more successful this way.

On the PyMC3 end, there are transformations that will automatically do the same thing (enforce a simplex). The Dirichlet distribution in PyMC3 uses a stick-breaking transformation for this, to ensure that the probabilities sum to one, without doing all the manual manipulation that you are coding. The best place to go for support in developing custom distributions is our Discourse site. We have several developers, along with hundreds of users, that read the site and will often chime in with feedback. They are particularly interested in helping folks do new things with PyMC3, so I expect you would get a fair bit of support.

helmutsimon · 2018-05-24T09:49:08Z

Thanks. The stick-breaking transormation did the job. If anyone else comes across this, the corrected code is as follows:

with Model() as model1:
    prior = np.ones(n) / n
    
    def dirich_logpdf(value=prior):
        return -n * gammaln(1/n) + (-1 + 1/n) * tt.log(value).sum()
    
    stick = distributions.transforms.StickBreaking()
    probs = DensityDist('probs', dirich_logpdf, shape=n, testval=np.array(prior), transform=stick)
    data = np.array([5, 7, 1, 0])
    sfs_obs = Multinomial('sfs_obs', n=np.sum(data), p=probs, observed=data)

fonnesbeck · 2018-05-24T13:43:06Z

🎉

helmutsimon closed this as completed May 24, 2018

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Unable to define Dirichlet prior #178

Unable to define Dirichlet prior #178

helmutsimon commented May 22, 2018 •

edited by fonnesbeck

Loading

fonnesbeck commented May 22, 2018

helmutsimon commented May 23, 2018 •

edited

Loading

fonnesbeck commented May 23, 2018

helmutsimon commented May 24, 2018

fonnesbeck commented May 24, 2018

Unable to define Dirichlet prior #178

Unable to define Dirichlet prior #178

Comments

helmutsimon commented May 22, 2018 • edited by fonnesbeck Loading

fonnesbeck commented May 22, 2018

helmutsimon commented May 23, 2018 • edited Loading

fonnesbeck commented May 23, 2018

helmutsimon commented May 24, 2018

fonnesbeck commented May 24, 2018

helmutsimon commented May 22, 2018 •

edited by fonnesbeck

Loading

helmutsimon commented May 23, 2018 •

edited

Loading