Stata Technical Bulletin
41
difficulties of choosing an appropriate ARMA specification, as well as a formidable computational task for each combination of
a and to to be evaluated. The methods described here assume that the short memory or ARMA components of the time series
are relatively unimportant, so that the long memory parameter d may be estimated without fully specifying the data-generating
process. These methods are thus described as semiparametric.
gphudak performs the Geweke and Porter-Hudak (GPH 1983) semiparametric log periodogram regression, often described
as the “GPH test,” for long memory (fractional integration) in a time series. The GPH method uses nonparametric methods—a
spectral regression estimator—to evaluate d without explicit specification of the ARMA parameters of the series. The series is
usually differenced so that the resulting e estimate will fall in the [-0.5,0.5] interval.
Geweke and Porter-Hudak (1983) proposed a semiparametric procedure to obtain an estimate of the memory parameter d
of a fractionally integrated process Xt in a model of the form
(1 - L')d Xt = et,
(3)
where et is stationary with zero mean and continuous spectral density ∕e (λ) > 0. The estimate d is obtained from the application
of ordinary least squares to
log(‰ (λs)) = c — dlog ∣1 — e*λa∣2 + residual (4)
computed over the fundamental frequencies {λs = 2πs∕n, s = 1,... ,m < n}. We define
1 ”
ωx (As) = -=Vxte^
'2-'
as the discrete Fourier transform (DFT) of the time series Xt, Ix (λs) = ωx (λs)o⅛ (λs)* as the periodogram, and xs =
log 11 — +λa I. Ordinary least squares on (4) yields
+ ∑Γ=ι x* ɪɑgʃ^ (λJ /,ʌ
2∑X1⅛ (5)
Various authors have proposed methods for the choice of m, the number of Fourier frequencies included in the regression.
The regression slope estimate is an estimate of the slope of the series’ power spectrum in the vicinity of the zero frequency; if too
few ordinates are included, the slope is calculated from a small sample. If too many are included, medium and high-frequency
components of the spectrum will contaminate the estimate. A choice of y∕T or 0.5 for power is often employed. To evaluate
the robustness of the GPH estimate, a range of power values (from 0.40 to 0.75) is commonly calculated as well. Two estimates
of the d coefficient’s standard error are commonly employed: the regression standard error, giving rise to a standard t test, and
an asymptotic standard error, based upon the theoretical variance of the log periodogram of τr2∕6. The statistic based upon that
standard error has a standard normal distribution under the null.
modlpr computes a modified form of the GPH estimate of the long memory parameter, d, of a time series, proposed by
Phillips (1999a, 1999b). Phillips (1999a) points out that the prior literature on this semiparametric approach does not address
the case of d = 1, or a unit root, in (3), despite the broad interest in determining whether a series exhibits unit-root behavior or
long memory behavior, and his work showing that the d estimate of (5) is inconsistent when d > 1, with d exhibiting asymptotic
bias toward unity. This weakness of the GPH estimator is solved by Phillips’ modified log periodogram regression estimator, in
which the dependent variable is modified to reflect the distribution of d under the null hypothesis that d = 1. The estimator
gives rise to a test statistic for d = 1 which is a standard normal variate under the null. Phillips suggests that deterministic
trends should be removed from the series before application of the estimator. Accordingly, the routine will automatically remove
a linear trend from the series. This may be suppressed with the notrend option. The comments above regarding power apply
equally to modlpr.
The Phillips (1999b) modification of the GPH estimator is based on an exact representation of the DFT in the unit root case.
The modification expresses
k⅛ (ʌs)
1 — +ʌ;
eiλs Xn
1 - eiλs, √'2τrn
and the modified DFT as
¾ (ʌs) — ^x (ʌs) +
eiλs Xn
1 - eiλ≈ √2τrn