Implicit in such criticism, there is often an underlying belief that educational researchers
make things complicated because they enjoy or value the complexity. While there may be a
small element of truth to such criticism, in most cases, researchers make things complicated
because they are complicated; they have learned that approaches that elide, or refuse to
acknowledge, these complexities have not been successful in addressing the challenges of
improving educational outcomes. Consider the issue of class-size reduction policies.
Class-size reduction programs (CSRPs) are an attractive route to improving educational
achievement, being popular with both parents and teachers. However, given the high cost of
implementation, it would seem advisable to get some clear evidence of the likely benefits
before embarking on such a program, especially given the political difficulties of reversing
such reforms.
The issue of class-size reduction would seem to be an ideal candidate for rigorous, high
quality research, and indeed there is no shortage of well-designed studies on the effects of
CSRPs. Perhaps the best-known of these is the Tennessee Student-Teacher Achievement
Ratio (STAR) study, described by Mosteller (1995) as “one of the greatest education
experiments in United States history”. Teachers and students in kindergarten and first grade
were assigned at random either to small classes (13-17 students), large classes (22-26
students) or large classes with a teacher’s aide. By the end of third grade, student
achievement was significantly higher, especially in reading, and the gains were most
marked for socio-economically disadvantaged students and those from minority ethnic
communities. More importantly, when the students returned to larger classes, although
some of the advantage of the smaller classes diminished (Krueger and Whitmore, 2001),
students who had experienced smaller classes had a lower rate of grade-retention (Pate-
Bain et al., 1997) and higher aspirations to continue education beyond school, evidenced by
increased tendency to take the SAT (Krueger and Whitmore, 2001). The fact that the
improvements were maintained over such a long period of time is significant, since so
many educational interventions have yielded initially promising effects that disappear when
a program is ended (e.g., Head Start: see Brody, 1992: pp. 175-175).
The size of the impact found in the star study was equivalent to an extra three or four
months learning per year for the students in the smaller classes, with effects up to twice as
great for minority students. So far, so good. Except, of course, it is not as simple as that.
First, the STAR study appeared to have no difficulty in recruiting additional teachers
without a reduction in average teacher quality, which is unlikely to be the case when such a
program is implemented on a state-wide or national basis (what economists call equilibrium
effects). In evaluating a CSRP in California, Jepson and Rivkin (2002) found that the
decline in teacher quality reduced, and in some cases completely negated, the effect of
smaller classes. Second, the STAR study found that the smaller classes made faster
progress in kindergarten and first grade, and thereafter, the gap between the smaller and
larger classes stayed constant. The fact that the earlier gains were maintained is important,
but so is that fact that smaller classes appeared to confer little benefit in second grade and
beyond. Indeed, the consistent finding across the research literature on CSRPs is that
effects are strongest in grades K to 3, much weaker in grades 4 to 8, and practically non-
existent in grades 9 to 12. These two points suggest that the benefits of CSRPs as system-
wide reforms are likely to be significantly smaller than found in the STAR study, and
suggests that the money might more profitably be spent in other ways. The research
working in Bohr’s quadrant, concerned only with fundamental understanding, might be