3 Description of the combination method
Selecting a single model as described in the previous section, even if implicitly recognizes the presence of
misspecification, does not account explicitly for model ambiguity. More importantly, it does not consider
that the true structure may not belong to the initial set of candidate models, as such to use only the best
minimizer is not necessarily the ultimate solution. This implies that in order to incorporate the information
contained in the KI, the combination of all plausible models in a similarity-weighted predictive distribution
is needed, where the weights are function of KcIj fcn (x),fj (x/s; bθ) .
The intuition is the following : KIj , can be interpreted as a measure of uncertainty or ignorance about
the true structure. When computed at the optimal value of the parameter θMj , it can be considered as a
measure of the goodness of the model, since it represents the margin of error of this model in a particular
sample. If it is different from zero for each candidate distribution and/or there are many models that exhibit
a similar loss, then the econometrician fearing misspecification will explicitly account for it combining the
models in the predictive distribution M (θMj )= j pj (KI)fj (x/s, θ). The similarity-weight pj (KI) can be
loosely interpreted as the probability of model Mj to be correct. In contrast, if the predictor selected a single
distribution Mj , he would overestimate the precision of this model, since he would implicitly assign to the
model probability (pj (KI)) of being correct equal one.
,^
In order to better appreciate the importance of the information contained in the model’s misspecification
and subsequently in M (θMj ), it is necessary to give a brief description of the spaces in which we operate
when the statistical structural assumptions are not necessarily true. Define G the space of functions to
which the true unknown model g(x/s) belongs: by assumption g(x/s) minimizes the KI over G. FΘMj ⊆ G
represents the finite dimensional space to which the parametric candidate models belong, we can call it the
approximation space and it is also the space where the estimation is carried out. The best approximation
fj(x/s,#*) in FθMj to the function g(x/s) is the p.d.f. that minimizes the KI over FθMj, while f (x/s, θ) ∈
FθMj minimizes the sample version of the KI. The distance between f (x/s, θ) and f (x/s, θ*) represents the
estimation error which vanishes as n →∞. Instead, the approximation error9 given by the distance between
f (x/s,#*) and g(x/s), can be reduced only if the dimension of FΘMj grows with the sample size10. Model
combination can therefore be considered as a method to increase the dimension of the parameter space
accounting for the approximation error.
Only if FθMj ≡ G, then g(x) = f (θ0, x) = f (θ*, x) and θ is a consistent estimator of the true parameter
θ0 . Typically, because of the advantages11 offered by parsimonious models, FΘMj is a small subset of G and
hence model misspecification can be a serious problem also affecting the asymptotic results. Furthermore,
in finite sample the KIj embodies information about both the estimation and approximation errors relative
to Mj , and as such it can not be ignored.
Once it is decided to use the combinations of p.d.f. M(θMj) as predictive density, the main task consists
in determining the probability pj (KI). For this purpose I show that (see section IV and the Appendix for
more details) KcIj minus a correction term (mn), mainly due to the approximation error, is asymptotically
distributed Normal N(0, 2σ2), where a consistent estimate of σ2 is determined only by the nonparametric
density. Then, the probability of being the correct model can be determined by the probability of obtaining
9See Chen X. and J.Z. Huang (2002).
10 For example a countable mixture of Normals (Ferguson (1983)) or the kernel density estimator (Silverm an (1986)) can
approximate arbitrarly close any well-behaving density function. We can view these models as infinite-dimensional parameter
alternatives.
11 Closed form solution, ease of manipulation and low computational costs.