一日一万字の感謝の写経

一日一万字の感謝の写経をして強くなります。そのうち本を置き去りにして何かを作り始める予定です。

Pythonで体験するベイズ推論 PyMCによるMCMC入門 第2章 (2.2.4まで) 写経

始めに

 この記事では、Pythonで体験するベイズ推論 PyMCによるMCMC入門

の第2章の2.2.4までを写経した記録をまとめます。前回はこちらPythonで体験するベイズ推論 PyMCによるMCMC入門 第1章 写経。 実行はJupyter Labにて実行しています。

 写経は以下のようにまとめました。

  • 写経ではありますが、コードを実行しやすいように関数を作成し、コードの再実行がやりやすいようにした部分があります。

  • また、より良いと思われるコードを考えた場合は書き換えてコメントを添えるようにし、変更点をなるべく明示するようにしてあります。

  • 個人的に気になった点のメモを残すようにしてあります。同じような疑問を持った方の助けになれば幸いです。

  • 以前書いたコードと同じようなコード(例えばグラフの描写等)は効率化のために飛ばしているところもあります。

 Pythonで体験するベイズ推論 PyMCによるMCMC入門を読んでいている際のちょっとした正誤表代わりになればと思います。

2 PyMCについてもう少し

2.1 はじめに

2.1.1 親子関係

import pymc as pm
lambda_ = pm.Exponential("poisson_param", 1)

data_generator = pm.Poisson("data_generator", lambda_)
data_plus_one = data_generator + 1
print("Childrem of lambda_: ")
print(lambda_.children, "\n")

print("Parents of data_generator: ")
print(data_generator.parents, "\n")

print("Children of data_generator: ")
print(data_generator.children)

Output:

    Childrem of lambda_: 
    {<pymc.distributions.new_dist_class.<locals>.new_class 'data_generator' at 0x000002720195BB38>} 
    
    Parents of data_generator: 
    {'mu': <pymc.distributions.new_dist_class.<locals>.new_class 'poisson_param' at 0x000002720195BB00>} 
    
    Children of data_generator: 
    {<pymc.PyMCObjects.Deterministic '(data_generator_add_1)' at 0x000002720195BBA8>}

2.1.2 PyMC変数

print(f"lambda_.value = {lambda_.value}")
print(f"data_generator.value = {data_generator.value}")
print(f"data_plus_one.value = {data_plus_one.value}")

Output:

    lambda_.value = 0.10977163451249013
    data_generator.value = 0
    data_plus_one.value = 1
discrete_uni_var = pm.DiscreteUniform("discrete_uni_var", 0, 4)

discrete_uni_varの説明を確認しておきます。

discrete_uni_var??

Output:

Signature:       discrete_uni_var(*args, **kwargs)
Type:            DiscreteUniform
String form:     discrete_uni_var
File:            c:\users\hoge\anaconda3\lib\site-packages\pymc\distributions.py
Docstring:       <no docstring>
Class docstring:
D = DiscreteUniform(name, lower, upper, value=None, observed=False, size=1, trace=True, rseed=True, doc=None, verbose=-1, debug=False)

Stochastic variable with DiscreteUniform distribution.
Parents are: lower, upper.

Docstring of log-probability function:

    Discrete uniform log-likelihood.

    .. math::
        f(x \mid lower, upper) = \frac{1}{upper-lower}

    :Parameters:
      - `x` : [int] :math:`lower \leq x \leq upper`
      - `lower` : Lower limit.
      - `upper` : Upper limit (upper > lower).
betas = pm.Uniform("betas", 0, 1, size=10)
betas.value

Output:

    array([ 0.34388063,  0.71044819,  0.8913627 ,  0.19832807,  0.57029206,
            0.5054775 ,  0.55222677,  0.12595211,  0.55920651,  0.34046255])
lambda_1 = pm.Exponential("lambda_1", 1)
lambda_2 = pm.Exponential("lambda_2", 1)
tau = pm.DiscreteUniform("tau", lower=0, upper=10)

print("Initialized values...")
print(f"lambda_1.value = {lambda_1.value}")
print(f"lambda_2.value = {lambda_2.value}")
print(f"tau.value = {tau.value}\n")

lambda_1.random()
lambda_2.random()
tau.random()

print("After calling random() on the variables...")
print(f"lambda_1.value = {lambda_1.value}")
print(f"lambda_2.value = {lambda_2.value}")
print(f"tau.value = {tau.value}")

Output:

    Initialized values...
    lambda_1.value = 0.08344212711504026
    lambda_2.value = 0.009108273929809445
    tau.value = 10
    
    After calling random() on the variables...
    lambda_1.value = 0.33157006248423776
    lambda_2.value = 0.46469007162531084
    tau.value = 6
type(lambda_1 + lambda_2)
pymc.PyMCObjects.Deterministic
@pm.deterministic
def some_deterministic(stoch=discrete_uni_var):
    return stoch.value**2 # valueを付けるとエラー

Error:

    ---------------------------------------------------------------------------

    AttributeError                            Traceback (most recent call last)

    <ipython-input-10-a9bb7333fdfc> in <module>
          1 @pm.deterministic
    ----> 2 def some_deterministic(stoch=discrete_uni_var):
          3     return stoch.value**2 # valueを付けるとエラー
    

    ~\Anaconda3\lib\site-packages\pymc\InstantiationDecorators.py in deterministic(__func__, **kwds)
        248 
        249     if __func__:
    --> 250         return instantiate_n(__func__)
        251 
        252     return instantiate_n
    

    ~\Anaconda3\lib\site-packages\pymc\InstantiationDecorators.py in instantiate_n(__func__)
        241         junk, parents = _extract(
        242             __func__, kwds, keys, 'Deterministic', probe=False)
    --> 243         return Deterministic(parents=parents, **kwds)
        244 
        245     keys = ['eval']
    

    ~\Anaconda3\lib\site-packages\pymc\PyMCObjects.py in __init__(self, eval, doc, name, parents, dtype, trace, cache_depth, plot, verbose, jacobians, jacobian_formats)
        441                           trace=trace,
        442                           plot=plot,
    --> 443                           verbose=verbose)
        444 
        445         # self._value.force_compute()
    

    ~\Anaconda3\lib\site-packages\pymc\Node.py in __init__(self, doc, name, parents, cache_depth, trace, dtype, plot, verbose)
        212         self.extended_children = set()
        213 
    --> 214         Node.__init__(self, doc, name, parents, cache_depth, verbose=verbose)
        215 
        216         if self.dtype is None:
    

    ~\Anaconda3\lib\site-packages\pymc\Node.py in __init__(self, doc, name, parents, cache_depth, verbose)
        127 
        128         # Initialize
    --> 129         self.parents = parents
        130 
        131     def _get_parents(self):
    

    ~\Anaconda3\lib\site-packages\pymc\Node.py in _set_parents(self, new_parents)
        145 
        146         # Get new lazy function
    --> 147         self.gen_lazy_function()
        148 
        149     parents = property(
    

    ~\Anaconda3\lib\site-packages\pymc\PyMCObjects.py in gen_lazy_function(self)
        452                                    cache_depth=self._cache_depth)
        453 
    --> 454         self._value.force_compute()
        455 
        456         self._jacobians = {}
    

    LazyFunction.pyx in pymc.LazyFunction.LazyFunction.force_compute (pymc\LazyFunction.c:2409)()
    

    <ipython-input-10-a9bb7333fdfc> in some_deterministic(stoch)
          1 @pm.deterministic
          2 def some_deterministic(stoch=discrete_uni_var):
    ----> 3     return stoch.value**2 # valueを付けるとエラー
    

    AttributeError: 'numpy.ndarray' object has no attribute 'value'
@pm.deterministic
def some_deterministic(stoch=discrete_uni_var):
    return stoch**2
@pm.deterministic
def some_deterministic(stoch): # deterministic関数の引数がキーワードとして渡されていないのでエラー
    return stoch**2

Error:

    ---------------------------------------------------------------------------

    ValueError                                Traceback (most recent call last)

    <ipython-input-12-e91cec33d01b> in <module>
    ----> 1 @pm.deterministic
          2 def some_deterministic(stoch): # deterministic関数の引数がキーワードとして渡されていないのでエラー
          3     return stoch**2
    

    ~\Anaconda3\lib\site-packages\pymc\InstantiationDecorators.py in deterministic(__func__, **kwds)
        248 
        249     if __func__:
    --> 250         return instantiate_n(__func__)
        251 
        252     return instantiate_n
    

    ~\Anaconda3\lib\site-packages\pymc\InstantiationDecorators.py in instantiate_n(__func__)
        240     def instantiate_n(__func__):
        241         junk, parents = _extract(
    --> 242             __func__, kwds, keys, 'Deterministic', probe=False)
        243         return Deterministic(parents=parents, **kwds)
        244 
    

    ~\Anaconda3\lib\site-packages\pymc\InstantiationDecorators.py in _extract(__func__, kwds, keys, classname, probe)
        113             if i < arg_deficit - 1:
        114                 err_str += ','
    --> 115         raise ValueError(err_str)
        116 
        117     # Fill in parent dictionary
    

    ValueError: Deterministic some_deterministic: no parent provided for the following labels: stoch

2.1.3 モデルに観測を組み込む

from IPython.core.pylabtools import figsize
from matplotlib import pyplot as plt
%matplotlib inline
import numpy as np

def plot_prior_dist_for_lambda_1(sample_size=20000):
    figsize(25, 8)
    plt.rcParams['font.size'] = 17 # フォントのサイズ指定する。
    samples = [lambda_1.random() for i in range(sample_size)]
    plt.hist(samples, bins=70, density=True, histtype="stepfilled")
    plt.title("Prior distribution for $\lambda_1$")
    plt.xlabel("Value")
    plt.ylabel("Density")
    plt.xlim(0, 8)
plot_prior_dist_for_lambda_1()

f:id:koheitsutsumi223:20190321201558p:plain

メモ

 初めて読んだときはは何言ってるかわかりませんでしたがvalue=dataと共にobserved=Trueを指定して、pm.Poissonを呼び出したら、random()の呼び出しに関わらずポアソン分布ではなくてvalueに渡した観測値が毎回返ってくるということを示したかったかな?そういう意味ではポアソン分布である必要はない気がする。

data = np.array([10, 5])
fixed_variable = pm.Poisson("fxd", 1, value=data, observed=True)
print(f"value: {fixed_variable.value}") # valueの表示
print("calling .random()") # .random()の呼び出し
fixed_variable.random()
print(f"value: {fixed_variable.value}")

Output:

    value: [10  5]
    calling .random()
    value: [10  5]

 指数分布を使って同様のコードを実行してみました。この場合はvalueで返される値がfloat型になる以外はポアソン分布のときと同じ結果のよう。

data = np.array([10, 5])
fixed_variable = pm.Exponential("fxd", 1, value=data, observed=True)
print(f"value: {fixed_variable.value}") # valueの表示
print("calling .random()") # .random()の呼び出し
fixed_variable.random()
print(f"value: {fixed_variable.value}")

Output:

    value: [ 10.   5.]
    calling .random()
    value: [ 10.   5.]

 これだったら、初めから観測値の分布を表すためのクラスが作ってあっても良いんじゃないかな?と思う.ただ、ある分布から得られた観測値ということを表現することが大事なのかもしれない。例えば、この場合は平均1の指数分布からの観測値が10と5だったというように。こればかりは今後の説明を待つしかないかもしれませんね。

data = np.array([10, 25, 15, 20, 35])
obs = pm.Poisson("obs", lambda_, value=data, observed=True)
print(obs.value)

Output:

    [10 25 15 20 35]

2.1.4 最後に

model = pm.Model([obs, lambda_, lambda_1, lambda_2, tau])

2.2 モデリングのアプローチ

2.2.1 同じ物語、異なる結末

tau = pm.rdiscrete_uniform(0, 80)
print(tau)

Output:

    50
alpha = 1 / 20
lambda_1, lambda_2 = pm.rexponential(alpha, 2) #引数の2はサイズ。
print(lambda_1, lambda_2)

Output:

    3.31182568691 11.2489589986
lambda_ = np.r_[lambda_1 * np.ones(tau),
               lambda_2 * np.ones(80 - tau)]
print(lambda_)

Output:

    [  3.31182569   3.31182569   3.31182569   3.31182569   3.31182569
       3.31182569   3.31182569   3.31182569   3.31182569   3.31182569
       3.31182569   3.31182569   3.31182569   3.31182569   3.31182569
       3.31182569   3.31182569   3.31182569   3.31182569   3.31182569
       3.31182569   3.31182569   3.31182569   3.31182569   3.31182569
       3.31182569   3.31182569   3.31182569   3.31182569   3.31182569
       3.31182569   3.31182569   3.31182569   3.31182569   3.31182569
       3.31182569   3.31182569   3.31182569   3.31182569   3.31182569
       3.31182569   3.31182569   3.31182569   3.31182569   3.31182569
       3.31182569   3.31182569   3.31182569   3.31182569   3.31182569
      11.248959    11.248959    11.248959    11.248959    11.248959    11.248959
      11.248959    11.248959    11.248959    11.248959    11.248959    11.248959
      11.248959    11.248959    11.248959    11.248959    11.248959    11.248959
      11.248959    11.248959    11.248959    11.248959    11.248959    11.248959
      11.248959    11.248959    11.248959    11.248959    11.248959    11.248959  ]

メモ

 np.r_はここでは二つのarrayオブジェクトを結合させるために使っているようです。 ただ、それ以上の使い方を読もうとしてもnumpy自体に疎いので説明が全く理解できませんでした…

help(np.lib.index_tricks.r_)

Output:

    Help on RClass in module numpy.lib.index_tricks object:
    
    class RClass(AxisConcatenator)
     |  Translates slice objects to concatenation along the first axis.
     |  
     |  This is a simple way to build up arrays quickly. There are two use cases.
     |  
     |  1. If the index expression contains comma separated arrays, then stack
     |     them along their first axis.
     |  2. If the index expression contains slice notation or scalars then create
     |     a 1-D array with a range indicated by the slice notation.
     |  
     |  If slice notation is used, the syntax ``start:stop:step`` is equivalent
     |  to ``np.arange(start, stop, step)`` inside of the brackets. However, if
     |  ``step`` is an imaginary number (i.e. 100j) then its integer portion is
     |  interpreted as a number-of-points desired and the start and stop are
     |  inclusive. In other words ``start:stop:stepj`` is interpreted as
     |  ``np.linspace(start, stop, step, endpoint=1)`` inside of the brackets.
     |  After expansion of slice notation, all comma separated sequences are
     |  concatenated together.
     |  
     |  Optional character strings placed as the first element of the index
     |  expression can be used to change the output. The strings 'r' or 'c' result
     |  in matrix output. If the result is 1-D and 'r' is specified a 1 x N (row)
     |  matrix is produced. If the result is 1-D and 'c' is specified, then a N x 1
     |  (column) matrix is produced. If the result is 2-D then both provide the
     |  same matrix result.
     |  
     |  A string integer specifies which axis to stack multiple comma separated
     |  arrays along. A string of two comma-separated integers allows indication
     |  of the minimum number of dimensions to force each entry into as the
     |  second integer (the axis to concatenate along is still the first integer).
     |  
     |  A string with three comma-separated integers allows specification of the
     |  axis to concatenate along, the minimum number of dimensions to force the
     |  entries to, and which axis should contain the start of the arrays which
     |  are less than the specified number of dimensions. In other words the third
     |  integer allows you to specify where the 1's should be placed in the shape
     |  of the arrays that have their shapes upgraded. By default, they are placed
     |  in the front of the shape tuple. The third argument allows you to specify
     |  where the start of the array should be instead. Thus, a third argument of
     |  '0' would place the 1's at the end of the array shape. Negative integers
     |  specify where in the new shape tuple the last dimension of upgraded arrays
     |  should be placed, so the default is '-1'.
     |  
     |  Parameters
     |  ----------
     |  Not a function, so takes no parameters
     |  
     |  
     |  Returns
     |  -------
     |  A concatenated ndarray or matrix.
     |  
     |  See Also
     |  --------
     |  concatenate : Join a sequence of arrays along an existing axis.
     |  c_ : Translates slice objects to concatenation along the second axis.
     |  
     |  Examples
     |  --------
     |  >>> np.r_[np.array([1,2,3]), 0, 0, np.array([4,5,6])]
     |  array([1, 2, 3, 0, 0, 4, 5, 6])
     |  >>> np.r_[-1:1:6j, [0]*3, 5, 6]
     |  array([-1. , -0.6, -0.2,  0.2,  0.6,  1. ,  0. ,  0. ,  0. ,  5. ,  6. ])
     |  
     |  String integers specify the axis to concatenate along or the minimum
     |  number of dimensions to force entries into.
     |  
     |  >>> a = np.array([[0, 1, 2], [3, 4, 5]])
     |  >>> np.r_['-1', a, a] # concatenate along last axis
     |  array([[0, 1, 2, 0, 1, 2],
     |         [3, 4, 5, 3, 4, 5]])
     |  >>> np.r_['0,2', [1,2,3], [4,5,6]] # concatenate along first axis, dim>=2
     |  array([[1, 2, 3],
     |         [4, 5, 6]])
     |  
     |  >>> np.r_['0,2,0', [1,2,3], [4,5,6]]
     |  array([[1],
     |         [2],
     |         [3],
     |         [4],
     |         [5],
     |         [6]])
     |  >>> np.r_['1,2,0', [1,2,3], [4,5,6]]
     |  array([[1, 4],
     |         [2, 5],
     |         [3, 6]])
     |  
     |  Using 'r' or 'c' as a first string argument creates a matrix.
     |  
     |  >>> np.r_['r',[1,2,3], [4,5,6]]
     |  matrix([[1, 2, 3, 4, 5, 6]])
     |  
     |  Method resolution order:
     |      RClass
     |      AxisConcatenator
     |      builtins.object
     |  
     |  Methods defined here:
     |  
     |  __init__(self)
     |      Initialize self.  See help(type(self)) for accurate signature.
     |  
     |  ----------------------------------------------------------------------
     |  Methods inherited from AxisConcatenator:
     |  
     |  __getitem__(self, key)
     |  
     |  __getslice__(self, i, j)
     |  
     |  __len__(self)
     |  
     |  ----------------------------------------------------------------------
     |  Data descriptors inherited from AxisConcatenator:
     |  
     |  __dict__
     |      dictionary for instance variables (if defined)
     |  
     |  __weakref__
     |      list of weak references to the object (if defined)
data = pm.rpoisson(lambda_)
print(data)

Output:

    [ 3  6  5  1  3  3  2  2  0  4  3  4  2  2  4  1  1  3  2  1  4  4  6  3  9
      4  4  1  5  2  1  6  2  7  5  2  1  2  3  3  3  7  3  5  2  2  8  3  4  7
     10  6 12 13  8 13  9 19 10 16  9 11 11 12  9  9  8 18 11  9  7 15  5 11 12
      8 16 12  9 10]
plt.bar(np.arange(80), data, color="#348ABD")
plt.bar(tau - 1, data[tau - 1], color="r", label="user behavior changed")

plt.xlabel("Time (days)")
plt.ylabel("Text messages received")
plt.title("Artificial dataset from simulating the model")
plt.xlim(0, 80)
plt.legend()
    <matplotlib.legend.Legend at 0x27207f35908>

f:id:koheitsutsumi223:20190321201603p:plain

def plot_artificial_sms_dataset():
    tau = pm.rdiscrete_uniform(0, 80)
    alpha = 1. / 20.
    lambda_1, lambda_2 = pm.rexponential(alpha, 2)
    data = np.r_[pm.rpoisson(lambda_1, tau),
                pm.rpoisson(lambda_2, 80 - tau)]
    plt.bar(np.arange(80), data, color="#348ABD")
    plt.bar(tau - 1, data[tau - 1], color="r",
           label="user behavior changed")
    plt.xlim(0, 80)

figsize(25, 10)
plt.suptitle("More examples of artificial datasets "
            "from simulating our model")

for i in range(4):
    plt.subplot(4, 1, 1 + i)
    plt.xlabel("Time (days)")
    plot_artificial_sms_dataset()

f:id:koheitsutsumi223:20190321201607p:plain

2.2.3 単純な場合

p = pm.Uniform('p', lower=0, upper=1)
p_true = 0.05
N = 1500

occurrences = pm.rbernoulli(p_true, N)

print(occurrences)
print(occurrences.sum())

Output:

    [False False False ..., False False False]
    72
print(f"What is the observed frequency in Group A? {occurrences.mean()}")
print(f"Does the observed frequency equal the true frequency? {occurrences.mean() == p_true}")

Output:

    What is the observed frequency in Group A? 0.048
    Does the observed frequency equal the true frequency? False
obs = pm.Bernoulli("obs", p, value=occurrences, observed=True)

mcmc = pm.MCMC([p, obs])
mcmc.sample(20000, 1000)

figsize(25, 8)

plt.vlines(p_true, 0, 200, linestyles="--", # テキストでは90だったが、200に修正しておいた。どうも、テキストの図よりもヒストグラムの最大値が大きくなってしまう。
          label="true $p_A$ (unknown)")
plt.hist(mcmc.trace("p")[:],
        bins=35, histtype="stepfilled", density=True)

plt.title("Posterior distribution of $p_A$, "
         "the true effectiveness of site A")
plt.xlabel("Value of $p_A$")
plt.ylabel("Density")
plt.legend()
     [-----------------100%-----------------] 20000 of 20000 complete in 9.3 sec




    <matplotlib.legend.Legend at 0x27209ac7b70>

f:id:koheitsutsumi223:20190321201612p:plain

2.2.4 AとBを一緒に

true_p_A = 0.05
true_p_B = 0.04

N_A = 1500
N_B = 750

observations_A = pm.rbernoulli(true_p_A, N_A)
observations_B = pm.rbernoulli(true_p_B, N_B)

print(f"Obs from Site A: {observations_A[:30].astype(int)}...")
print(f"Obs from Site B: {observations_B[:30].astype(int)}...")

Output:

    Obs from Site A: [0 0 0 0 0 0 0 0 1 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0]...
    Obs from Site B: [0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0]...
print(observations_A.mean())
print(observations_B.mean())

Output:

    0.0526666666667
    0.0386666666667
p_A = pm.Uniform("p_A", 0, 1)
p_B = pm.Uniform("p_B", 0, 1)

@pm.deterministic
def delta(p_A=p_A, p_B=p_B):
    return p_A - p_B

obs_A = pm.Bernoulli("obs_A", p_A, 
                     value=observations_A, observed=True)
obs_B = pm.Bernoulli("obs_B", p_B, 
                     value=observations_B, observed=True)

mcmc = pm.MCMC([p_A, p_B, delta, obs_A, obs_B])
mcmc.sample(25000, 5000)

Ooutput:

     [-----------------100%-----------------] 25000 of 25000 complete in 4.3 sec
p_A_samples = mcmc.trace("p_A")[:]
p_B_samples = mcmc.trace("p_B")[:]
delta_samples = mcmc.trace("delta")[:]

figsize(25, 20)

ax = plt.subplot(311)
plt.hist(p_A_samples, histtype='stepfilled', bins=30, alpha=0.85, 
        label="posterior of $p_A$",
        color="#A60628", density=True)
plt.vlines(true_p_A, 0, 80, linestyles="--", 
          label="true $p_A$ (unknown)")
plt.legend(loc="upper right")
plt.xlim(0, .1)
plt.ylim(0, 80)
plt.suptitle("Posterior distributions of "
            "$p_A$, $p_B$, and delta unknown")


ax = plt.subplot(312)
plt.hist(p_B_samples, histtype='stepfilled', bins=30, alpha=0.85, 
        label="posterior of $p_B$",
        color="#467821", density=True)
plt.vlines(true_p_B, 0, 80, linestyles="--", 
          label="true $p_B$ (unknown)")
plt.legend(loc="upper right")
plt.xlim(0, .1)
plt.ylim(0, 80)
plt.ylabel("Density")

ax = plt.subplot(313)
plt.hist(delta_samples, histtype='stepfilled', bins=30, alpha=0.85, 
        label="posterior of delta", 
        color="#7A68A5", density=True)
plt.vlines(true_p_A - true_p_B, 0, 60, linestyles="--", 
          label="true delta (unknown)")
plt.vlines(0, 0, 60, color="black", alpha=0.2)
plt.xlabel("Value")
plt.legend(loc="upper right")
    <matplotlib.legend.Legend at 0x2720a2c90f0>

f:id:koheitsutsumi223:20190321201616p:plain

figsize(25, 6)
plt.hist(p_A_samples, histtype='stepfilled', bins=30, alpha=0.80, 
        label="posterior of $p_A$", 
        color="#A60628", density=True)
plt.hist(p_B_samples, histtype='stepfilled', bins=30, alpha=0.80, 
        label="posterior of $p_B$", 
        color="#467821", density=True)

plt.legend(loc="upper right")
plt.xlabel("Value")
plt.ylabel("Density")
plt.title("Posterior distributions of $p_A$ and $p_B$")
plt.ylim(0, 80)
plt.xlim(0, .1)
    (0, 0.1)

f:id:koheitsutsumi223:20190321201621p:plain

print(f"Probability site A is WORSE tahn B: {(delta_samples < 0).mean()}")
print(f"Probability site A is BETTER tahn B: {(delta_samples > 0).mean()}")

Output:

    Probability site A is WORSE tahn B: 0.07845
    Probability site A is BETTER tahn B: 0.92155

メモ

 true_p_Atrue_p_BN_AN_Bを色々変えられるような関数を作ってみました。おもしろい。ただ、plt.vlinesymax引数を動的に調整するにはどうすればいいのかな?と思った。

def plot_post_dist_p_A_p_B_delta(true_p_A=0.05, true_p_B=0.04, N_A=1500, N_B=750):
    true_p_A = true_p_A
    true_p_B = true_p_B

    N_A = N_A
    N_B = N_B

    observations_A = pm.rbernoulli(true_p_A, N_A)
    observations_B = pm.rbernoulli(true_p_B, N_B)

    p_A = pm.Uniform("p_A", 0, 1)
    p_B = pm.Uniform("p_B", 0, 1)

    @pm.deterministic
    def delta(p_A=p_A, p_B=p_B):
        return p_A - p_B

    obs_A = pm.Bernoulli("obs_A", p_A, 
                         value=observations_A, observed=True)
    obs_B = pm.Bernoulli("obs_B", p_B, 
                         value=observations_B, observed=True)

    mcmc = pm.MCMC([p_A, p_B, delta, obs_A, obs_B])
    mcmc.sample(25000, 5000)

    p_A_samples = mcmc.trace("p_A")[:]
    p_B_samples = mcmc.trace("p_B")[:]
    delta_samples = mcmc.trace("delta")[:]

    figsize(25, 20)

    ax = plt.subplot(311)
    plt.hist(p_A_samples, histtype='stepfilled', bins=30, alpha=0.85, 
            label="posterior of $p_A$",
            color="#A60628", density=True)
    plt.vlines(true_p_A, ymin=0, ymax=80, linestyles="--", 
              label="true $p_A$ (unknown)")
    plt.legend(loc="upper right")
    plt.xlim(min(p_A_samples), max(p_A_samples))
    # plt.ylim(0, 80)
    plt.suptitle("Posterior distributions of "
                "$p_A$, $p_B$, and delta unknown")


    ax = plt.subplot(312)
    plt.hist(p_B_samples, histtype='stepfilled', bins=30, alpha=0.85, 
            label="posterior of $p_B$",
            color="#467821", density=True)
    plt.vlines(true_p_B, ymin=0, ymax=80, linestyles="--", 
              label="true $p_B$ (unknown)")
    plt.legend(loc="upper right")
    plt.xlim(0, .1)
    # plt.ylim(0, 80)
    plt.ylabel("Density")

    ax = plt.subplot(313)
    plt.hist(delta_samples, histtype='stepfilled', bins=30, alpha=0.85, 
            label="posterior of delta", 
            color="#7A68A5", density=True)
    plt.vlines(true_p_A - true_p_B, 0, 60, linestyles="--", 
              label="true delta (unknown)")
    plt.vlines(0, 0, 60, color="black", alpha=0.2)
    plt.xlabel("Value")
    plt.legend(loc="upper right")
plot_post_dist_p_A_p_B_delta(true_p_A=0.1, N_A=10000, N_B=10000)
     [-----------------100%-----------------] 25000 of 25000 complete in 12.5 sec

f:id:koheitsutsumi223:20190321201626p:plain

 今回の写経は以上です。 ここまで読んでいただきありがとうございました。