and combination: Diebold and Lopez (1996), Hendry and Clements (2001) and Giacomini (2003) among
others. Finally, the third strand consists of the vast literature on dynamic portfolio choice under model
misspecification where investors try to learn from historical data, see for example Uppal and Wang (2002)
and Knox(2003).
The paper is organized as follows: Section II describes the estimation and selection method; section III
illustrates the models combination technique; Section IV analyzes the asymptotic properties of the parameters
estimator and the asymptotic distribution of the uncertainty measure; Section V discusses the finite sample
performance of the parameters estimator; Section VI contains the empirical application to stock returns;
Section VII investigates the model’s implications for the optimal asset allocation; and Section VIII concludes.
Analytical proofs and technical issues are discussed in the Appendix.
2 Description of the estimation and selection method
I consider a prediction problem for which a finite set of candidate models M ≡ {Mj,j =1, ..., J} is given.
In particular, these models Mj are defined as probability density functions fj (x; θ) {f : R → [0, ∞]} of
a random variable of interest X {X : Ω → R} defined on the probability space (Ω, A,P) taking values in
(R, B(R), Px). The goal of the predictor is to estimate and rank these models according to their similarity
to past observations, and finally to combine them in a similarity-weighted probability distribution. Given
the set M, we define the set of elements that have to be ranked as Θ = θMj : fj (x; θ) ≡ Mj ∈ M ,and
Θ ⊂ Rk.
Since in the empirical analysis, I want to allow the random variable of interest to follow a different
distribution over different regimes, I define an additional finite set S which is the set of the states of nature.
Define the state s {s : S → Z+, Z+ is the set of positive integers} a random variable defined on the
probability space (S, σ(S), p), taking on only discrete values. Further, in order to focus the attention only
on the uncertainty about the model, let me assume that s can be observed. Thus, the model’s definition is
equal to fj (x/s; θ) and Θ equals θMjs : fj (x/s; θ) ≡ Mjs ∈ M .
The information set Ω is a finite set of Q samples of Nq independent realizations of the random variable
X. Given the set Ω, its information content is processed estimating a nonparametric density fn (x/s) for each
sample q = 1, ...Q. Subsequently, from the set Ω, I derive the set of past cases C = ∣fnq (x/s) : x ∈ Ω and s ∈ S j∙,
which is the final information that the predictor posses to judge the different models. I assume that, given
a regime, all the subsamples derive from the same fixed distribution. The problem is then to describe how
to process and recall this information to assess the similarity of past observations to the set of candidate
models.
Lets define the weight a map w : Θ×C → R, it assigns a numerical value wqj to each pair of past case
fnq (x/s) and parameter θMjs, representing the support that this case lends to the model fj (x/s; θ) in M.
The sum of weights wqj represents the tool through which the predictor judges the similarity of a particular
model to the estimated distributions which his knowledge is equipped with. More precisely these weights
represent the degree of support that past distributions lend to the specific model at hand. However, they
also embody the misspecification contained in each model, that being just an approximation of the reality
still preserves a distance from the actual data. It seems reasonable that the model with the lowest distance
from the nonparametric densities, is also the model with the highest similarity to past observations. As such,
it has to be the model characterized by the highest sum of weights.