45
The predictive distribution (3.4) can be shown to be
1 i~1
= + forz = 2,...,n. (3.7)
г l j=ι
Multiplying p{Xι) and p(Xi ∣ Xi,..., Xi~ι), i — 2,... ,n, defines the joint marginal
distribution p(Xι,... ,Xn)∙ When F0 has finite support and a = 1, the two expres-
sions above imply that sampling (X↑,... ,Xn) can be interpreted as drawing from an
urn whose initial proportion of balls of color x is F0({τ}). We draw the г-th ball from
the urn, its color Xi is registered, the ball is put back into the urn and a new ball of
the same color as the one just extracted is added to the urn.
Mostly due to its computational advantages and the easy interpretation of the
parameters, the DP is the most popular nonparametric Bayesian prior. Nevertheless,
an almost surely discrete prior is not desirable in many applications. A simple solution
is to consider a convolution of G ~ DP with a continuous kernel. Such a (hierarchical)
model is known as Dirichlet process mixtures (DPM) (MacEachern, 1994, Escobar
and West, 1995):
xi I θi ~ F(xi I θi) and independent for i = 1,..., n
θι,...,θn I Gi⅛dG (3.8)
G~DP(o,G0),
i.e. xi ~ ʃ F(xi I θ) dG(θ) and G ~ DP(α, G0).
The model above is easily extended to consider non identically distributed samples
such as the regression model: xi ∣ θi ~ N(zfθi, σ) where zi is a vector of covariates of
the same dimension as θi and σ the common precision. In general, suppose that the
distributions of each xi are (possibly different) known distributions Fi, indexed with
Qi, and possibly additional parameters σ that are common across all i, i.e.,
Xi∖θi,σ~ Fi(Xi∖θi,σ).
Moreover, we can include a hyperprior distribution for the parameter a and, besides,
consider that the base measure G0 depends on parameters 7 with prior distribution