a misspecification KIj worse than the one actually obtained. That is:
pj (KcI)=1- P (KcIj ≤ ki).
(11)
Since it is well known that KI(g, fj (θ)) ≥ 0, where the equality attains if and only if g = fj , then pj (KI)=1
if and only if ki =0. This follows trivially from the fact that P (KIj ≤ 0) = 0. Consequently, pj (KI) will
be less than one for any positive realization of KIj . Accordingly, if the ki is very small, then the probability
(P (KIj ≤ ki)) of obtaining a realization of the misspecification even smaller than a such low value will be
very little; it then follows that the probability pj (KI) of having a good model will be very high.
It is clear that to determine the weight it is just sufficient to compute the c.d.f of a Normal with mean
mn and variance 2σ2 for the realized value ki. Nevertheless, in the implementation of this methodology,
it is necessary to pay attention to the mean mn that, being affected by the approximation error, varies
with the candidate model. In the next section and in the appendix, the device to fix this problem and the
measurement of mn are described in more details.
4 Asymptotic results
Before proceeding with the theorems let me state first all the assumptions12:
A1:{Xi } are i.i.d with compact support S, their marginal density g exists, is bounded away from zero, and
is twice differentiable. Its first order derivative is also bounded and moreover |g00(x1) - g00(x2)| ≤ C |x1 - x2 |
for any x1 ,x2 ∈ S and for some C ∈ (0, ∞).
A2:ThekernelK is a bounded symmetric probability density function around zero, s.t :(i) K (u)du =1;
(ii) u2K(u)du < ∞; (iii) h = hn → 0 as n → ∞; (iv) nhn →∞as n →∞.
A3: Depending on the application, it is possible to select a kernel K that satisfies A2 and such that the
tail-effect terms involved in the use of the KI are negligible.
A4: Θ is a compact and convex subset of Rk, the family of distributions F (θMj ) has density fj (θ, x)
which are measurable in x for every θMj ∈ Θ and continuous in θMj for every x ∈ Ω; Eg [log g(x) — log fj (θ, x)]
exists and has a unique minimum at an interior point θMjof Θ; log fj(θ, x) is bounded by a function b(x) for
all θMj ∈ Θ, where b(x) is integrable w.r.t. the true distribution G.
A5: the first and second derivative of log fj(θ,x) w.r.t. θMj and ∣ d log∂θ(θ,x) × d logffj(θ,x) ∣ are also
dominated by b(x); B(θM. ) is non singular and A(θM. ) has a constant rank in some open neighborhood of
* ι. *ia* ʌ ∂∖g ∂ log fj (θ*,χ) ∂ log fj (θ*,x)λ 2/ ʌ] j z∣∕∕ι* ʌ ∂ Γ∂2 log fj (θ*,x) z ʌ]
θMj ; where B(θMj) = E lʌ —∂jθ × —- , gg g2(x)∖ and A(θMj) = E [ —∂θi∂'j g(χ)∖.
Assumption A1 requires that Xi are continuously distributed and imposes regularity conditions on the
unknown density g. A2 represents the standard assumptions on the kernel function and the smoothing
parameter used in the nonparametric literature. Assumption A3 is a practical assumption that we need in
order to simplify the proofs and ignore the tail-effects due to the use of the Kullback-Leibler distance. As
indicated by Hall(1987) it is important that K is chosen such that its tails are sufficiently thick with respect
to the tails of the underlying function fj (θ, x). Since we know the candidate parametric models it is always
possible to choose an adequate Kernel. Furthermore, Hall suggested a practical alternative which is given
by the Kernel K(u) = 0.1438 * exp[-2 {log(1 + |u|)} ] whose tails decrease more slowly than the tails of
12It is important to note that from now on, for simplicity, I drop the index indicating the regime s.