As a whole, the conditions
Ph ≠ 0, ∀h ∈ c?
P irreducible
are sufficient, although non necessary, for consistency and asymptotic normality
of the Maximum Likelihood estimator with either T —> ∞ or N —> ∞, where in
the latter case the T must be at least equal to 2. Of course, since these conditions
are only sufficient, the less stringent assumptions stated in Propositions 5 and
6 lead to the same result.
Extension to non-Markov chains such as the VAP(£) model are straight-
forward, since it is always possible to rewrite a bivariate VAP(£) model as a
finite state-space Markov chain with V states, and hence the conditions stated
in Propositions 5 and 6 apply to the transition probabilities matrix and initial
probabilities vector corresponding to the new Markov chain.
When covariates are introduced, it is an obvious requirement that they are
weakly exogenous with respect to the transition probabilities (see Engle, Hendry
and Richard (1983)) in order to achieve asymptotic efficiency of the ML esti-
mates. Under this additional assumption, the same conditions stated in Propo-
sitions 5 and 6 ensure consistency and asymptotic normality.
Let us briefly address some problems possibly arising in finite samples. It
may be easily shown that, in the homogeneous population model, the transition
probabilities are estimated essentially as the ratio of the number of cases where
the transition occurred (say, Nhk) over the number of cases where it could have
occurred (say, Nh). In any finite sample, the distribution of Nhk given Nh will be
Binomial, and will hence converge in distribution to the Normal with Nh —> ∞.
On the other hand, the distribution of Nh depends on N , T, the transition
probabilities matrix P and the initial probabilities vector p: in the light of this,
propositions 5 and 6 state the conditions on P which ensures divergence of such
distribution with either N (Proposition 5) or T (Proposition 6). This divergence
leads to normality of the distribution of the ratio ∙^∙. Notice however that,
in finite samples, this distribution will not be normal. In particular, for states
which have been visited few times (Nh small), the distribution of Nhk given Nh
may be very far from normality, especially when the true transition probability
Phk is close to zero or one, which will give highly skewed distributions. Clearly,
the distribution of Wald type tests for hypotheses involving parameters whose
estimates are affected by this problem will be very far from the asymptotic χ2
distribution, while the asymptotic approximation might work quite well when
the parameters involved in the restriction are estimated based on large N∣l's.
Mimicking similar results in other models, LR type tests might perform better
in the first case, and worst in the second, but a careful analysis of finite sample
properties is needed to make precise statements. In any case, in order to get
an idea of how reliable the asymptotic distribution can be, it is convenient to
check the number of data points in each state with at least one unknown exiting
transition probability. In fact, even if N × T is large, the information about
some of the transitions might be quite scarce.
14