with attendance: students who are more able, work harder or are more mo-
tivated, tend to have higher attendance levels. As a consequence, the OLS
estimator of β1 in equation (1), omitting x2, would be biased and inconsis-
tent, as it would attribute to attendance an effect that is actually due to
unobservable student characteristics. In short, we face a classic example of
omitted variable bias.
One possible solution is to find appropriate proxy variables for student
input. This implies estimating
yi = β 1x 1 i + β2x 2i + εi (2)
where we assume that x2i = γ0 + γ 1 x ⅛i + νi describes the relationship between
the unobservable factors and the proxy variables. Note that, in order to
obtain a consistent estimator for β1 , x1i and νi must be uncorrelated: the
proxy variables must capture all of the correlation between the unobserved
factors (student input) and the regressor of interest (attendance). In the
following we use high school grade, grade point average, exams per annum
and calculus as proxies for ability, hours of study as a proxy for effort, and
subject and teacher evaluation as proxies for motivation.15
If there are no proxy variables available, or the ones available are not suit-
able because they do not capture all the correlation between the regressor of
interest and the omitted factors, an alternative solution is to find appropri-
ate instrumental variables for attendance. The instruments would IaVllow to
net out the correlation of student input with attendance, so that β1 would
measure its net effect on academic performance. Note, however, that the
consistence of the IV estimator relies on the assumption of instrument valid-
ity, which is often difficult to maintain in practice. In addition, even if the
assumption of instrument validity is satisfied, the instruments can be weakly
related to the endogenous variables, resulting in imprecise estimates.
In the following we consider estimates of equation (2), with and without
the inclusion of the proxy variables, obtained by two-stage least squares,
using travel time, work and web as instruments for attendance.16 The choice
of the instruments is based on the assumption that longer travel time, being
15Note that subject and teacher assessment provide information about the “match”
between academic and student inputs. They are therefore a measure of the suitability of
the student for the subject, which is what we refer to by the term motivation.
16We report Davidson-McKinnon (1993) endogeneity tests of the null hypothesis that
attendance is uncorrelated with the error term, so that OLS is a consistent estimator,
under the maintained assumption that the IV estimator is consistent. We also report