14
Stata Technical Bulletin
STB-4
Variable ∣ |
Coefficient |
Std. Error |
t |
Prob > It I |
Mean |
change3 I |
2.003001 | ||||
---------+- |
-1.339477 |
.8394873 |
-1.596 |
0.111 |
.4102871 |
Iagecont I |
-.1867027 |
.0251348 |
-7.428 |
0.000 |
55.79513 |
married I |
-.2844878 |
.8375503 |
-0.340 |
0.734 |
.5915072 |
Inonwht I |
-1.863855 |
1.035882 |
-1.799 |
0.072 |
.1794258 |
_cons I |
13.47238 |
1.536624 |
8.768 |
0.000 |
1 — |
Repeated measures ANOVA in some sense is a summary of the changes between periods and how those changes differ with the
independent variables. You can think of repeated measures ANOVA as taking all of the possible change regressions that one could
run and weighting them somehow to produce a test statistic. Above are two of the regressions and they are similar (remember,
the first regression is a change of four years while the second is only over two years, so effects in the second should be smaller).
The similarity in the two regressions informs us that we can probably interpret any one of the regressions as reflecting overall
trends. We find that the coefficients have reversed signs when compared to the cross-sectional regression. For instance, males
start with higher mental health than females but, relative to females, their mental health declines over time.
One way to address the change over time is to compute the slope for each observation. That is, we have a mental health
measurement at baseline, 3 months, and 1, 2, and 4 years. For each observation, we could estimate a regression fitting a straight
line to the data. The regression itself does not interest us, but we could use the resulting estimate of the slope as a proxy for the
change over time and we could then use the slope as the subject of our analysis. One major advantage is that we can compute
slopes even in the presence of missing data.
There are pluses and minuses to using slopes. On the negative side, we must acknowledge that the proper relationship is
not necessarily a constant slope over time. If this were intervention data, for example, we would anticipate a larger effect due
to the intervention at earlier times and smaller ones later. If there is missing data, as there is in our case, we should weight
the slopes in our subsequent analysis since they are not all computed using the same amount data. Slopes are also a somewhat
inefficient estimate.
On the positive side, a slope is intuitive and, as pointed out, can be computed for any observation that has two or more
data points. This gives us some options for dealing with missing data. I will discard observations which are not observed in at
least one of the follow-up periods since any change score we could compute from them would depend only on baseline data.
This said, I am going to compute the slopes. This turns out to be possible but tedious in Stata.
The formula for a slope is ɑæj,/uæ where σxy is the covariance of y and x and σx is the variance of x. In this case,
y is the mental health measurement and x is the time at which the measurement was taken. We will calculate this ratio as
(Σ(¾ ^^ ∙T)iΛ )∕(∑2(∙r; ^^ T)2). Remember that the previously calculated mhimiss is the number of missing mental health
measurements:
. gen xbar = O
. replace xbar = xbar + .25 if mhi3mo~=. ∕* 3 months = .25 of a year */
(3415 changes made)
. replace xbar = xbar + 1 if mhilyr~=.
(1891 changes made)
. replace xbar = xbar + 2 if mhi2yr~=.
(1822 changes made)
. replace xbar = xbar + 4 if mhi4yr~=.
(1455 changes made)
. replace xbar = xbar / (5-mhimiss)
(3869 changes made)
. gen x2 = O
. replace x2 = x2 + (-xbar)^2 if mhiθ~=.
(1879 changes made)
. replace x2 = x2 + (.25-xbar)^2 if mhi3mo~=.
(3415 changes made)
. replace x2 = x2 + (l-xbar)^2 if mhilyr~=.
(1891 changes made)
. replace x2 = x2 + (2-xbar)^2 if mhi2yr~=.
(1822 changes made)
. replace x2 = x2 + (4-xbar)^2 if mhi4yr~=.
(1455 changes made)
. gen xy = O