For these reasons, it seems natural to determine wqj
parametric density fnq (x/s) and the model fj (x/s; θ):
by the opposite of the distance between the non-
Wqj = -KI (fnq (x),fj (x/s; θ)), (1)
where KI(fnq (x), fj (x/s, θ)) is the Kullback-Leibler distance5 , whose empirical version in this study is
defined as follows:
Nq
KI qj = X cq (Xi)log{ ,θ}. (2)
where i is the index for all observations contained in a sample q. For simplicity I dropped the index relative
to the regime s.
If the values of the optimal parameters were known, the prediction rule - ranking the plausibility of each
model through the sum of their weights (over the past cases) - will lead us to choose as predictive density
f1 rather than f2 if and only if:
wq1 > wq2, (3)
q∈Cs q∈Cs
(where Cs is a partition of C and represents the set of past cases relative to regime s) or equivalently:
X KI(cq(x), fι(x∕s; θ)) < X KI(fcq(x), f2(x∕s; θ)). (4)
q∈Cs q∈Cs
The sum of the weights relative to model f1 can be interpreted as in Gilboa and Schmeilder (2001) as
the “aggregate similarity or plausibility” of model f1 . However, as the values of the optimal parameters are
unknown, it is necessary to estimate them. Since the model with the largest aggregate similarity to past cases
is the most appropriate to achieve a good prediction, the candidate model’s parameters θMjs are obtained
in the following way:
max wqj =min KI(fcnq (x),fj (x/s; θ)). (5)
θMjs q∈Cs θMjs q∈Cs
The minimization of the sum of these pseudo-distances allows us to obtain the optimal minimum contrast
(MC) estimates6 of the parameters that characterize the a priori distributions. This method gives us the
opportunity to extract the information contained in a nonparametric estimate, while preserving the simplicity
of a parametric model. This goal can be achieved by density-matching: the optimal model is derived to be
consistent with the observed distribution of the data7 .
It follows then that the rank of the competing models is obtained as follows:
f1 Â f2 IFF min X KI(fcnq(x),f1(x/s; θ)) < min X KI (fcnq (x), f2 (x/s; θ)),
θM1 ∈Θ q∈Cs θM2∈Θq∈Cs
(6)
which in turn implies that the best model can be represented by the following prediction rule:
inf
{ j:1,...,J}
min
θMj ∈Θ
X KI(fcq(x),fj(x/s; θ))
q∈Cs
(7)
5 We can choose many other distances, on this purpose see Ullah A.(1996).
6See Dhrymes P. J. (1994) p. 282 .
7See Aït Sahalia Y. (1996).