where w = exp(Θ2) and Ω = -Θμ. The equations in (3) indicate that, in this model specification, Xβ solely
determines E[Y], σ2 independently controls Var[U], and μ and Θ determine error term skewness and
kurtosis. Thus, standard heteroskedastic specifications can be introduced by making σ2 a function of the
variables influencing Var[U], without affecting the E[Y] or the error term skewness or kurtosis. Evaluation
of Skew[U] and Kurt[U] shows that if Θ≠0 but μ=0 the distribution of U is kurtotic (i.e. “fat-tailed”) but
symmetric. The sign of Θ is irrelevant, but higher values of Θ cause increased kurtosis. If Θ≠0 and μ>0, U
has a kurtotic and right-skewed distribution, while μ<0 results in a kurtotic but left skewed distribution.
Higher values of μ increase both skewness and kurtosis, but kurtosis can be scaled back by reducing ∣Θ∣.
In short, a wide variety of right and left skewness-kurtosis coefficient combinations can be
obtained by altering the values of these two parameters. Also, if μ=0, S(Θ,μ)=0, and the former becomes
symmetric but kurtotic error term model. Further, as Θ goes to zero, U approaches σV, Var[U] approaches
σ2 and K(Θ,0) also becomes zero, indicating that the normal-error regression model is nested to this non-
normal error model. As a result, in applied regression analysis, if the error term is normally distributed,
both μ and Θ would approach zero and the proposed estimator for the slope parameter vector β would be
the same as OLS. Also, the null hypothesis of error term normality (i.e. OLS) vs. the alternative of non-
normality can be tested as Ho: Θ=0 and μ=0 vs. Ha: Θ≠0 and μ≠0. The null hypothesis of symmetric non-
normality versus the alternative of asymmetric non-normality is Ho: Θ≠0 and μ=0 vs. Ha: Θ≠0 and μ≠0.
To specify a non-normally distributed and autocorrelated error term model, consider a model with
an n×1 error term vector U, which is normally distributed but not i.i.d. Following Judge, et al., let Φ = σ2ψ
be the covariance matrix of the error term vector, P be an n×n matrix such as P’P = ψ^1, Y* = PY (an n×1
vector), and X* = PX (an n×k matrix), where Y and X are the vector and matrix of original dependent and
independent variables. Given the choice of P, the transformed error term U* = PU = P(Y-Xβ) =
(PY-PXβ) = (Y*-X*B) is i.i.d. Under the assumption of normality, the log-likelihood function that has to
be maximized in order to estimate a multiple regression model with non-i.i.d. errors then is:
(4) NLLj = -(n/2)lnfc2) -0.5×ln ∣ψ∣ √U*,U*Z2σ).