selection problem may arise in the 1994 to 2002 time period, which means that some
characteristics are similar among the respondents who have voluntarily dropped out of the survey
before 1994. For example, if people with lower education levels tend to drop out of the survey
more than people with higher education levels, the sample pool of the period 1994 to 2002 will
have a higher average education level then the general NLSY79 sample, causing a systematic
upward bias in the estimated coefficient on education level. Second, instead of the sample
selection problem, the difference in the two periods might result from the structural changes in
the sample population. For example, in the first period, most of the respondents grow up from
their early twenties, so that they tend to give birth to more children in the first period. With more
young children, they spend more time at home, which may affect their wage rates. Some women
quit their jobs altogether. In contrast, most respondents seldom have more new-born children in
the second period when they are typically above forty years old. Thus, the effect of the number
of children on the wage rate may be significant in only the first period.
Regression 3E is designed to test the null hypothesis of no systematic difference between
people who remained in the survey and those who dropped out in the first period. We track all
remaining respondents in the second period back to the first period and run the same regression
from 1979 to 1994 for this group of respondents only. Any systematic difference will yield
coefficient estimates in regression 3E that are different from those in regression 3C. The results
are very revealing. The estimated coefficients in 3E are extremely similar to those in 3C. The
coefficient of g records the biggest difference at 0.016, but a Chi square test shows that we still
cannot reject the null hypothesis that the two coefficients are the same.24 Therefore, we cannot
reject the null hypothesis of no systematic difference between the samples for the two time
periods. We conclude that no significant sample selection problem exists in the second period.
24 Chi square = 0.03, p>Chi square = 0.8561
20