difficulties are exacerbated by the limitations of the methods used to address
comparability. In general, education analysts do not conduct ‘active’ studies that
involve allocating pupils to schools, teachers, or examinations for research purposes.
For a variety of practical and ethical reasons, analysts find themselves faced with the
rather more ‘passive’ analysis of datasets, over which they have no control. The
problem with this ‘post hoc dredging of sullen datasets’ (Gorard 2006a) is that the
statistical methods usually involved were designed for use only in active research
(Lunt 2004).
The design of experimental approaches to research allows us to make observations of
difference or pattern in practice that can be directly related to a prior theory or
hypothesis (Gorard 2002). A problem arises, however, when this logic of
experimentation is extended to other approaches, such as the regression analyses used
to create value-added measures (Gorard 2006b). Without a controlled trial, the direct
link between a hypothesis and its testing in practice disappears, and is replaced by a
much a weaker form of ‘test’, such as those based on probability and significance.
The results of these can be very misleading (Lunt 2004). For, in most research
situations, it is not sampling variation that is the key to understanding and unlocking
the process (Ziliak and McCloskey 2004). However, sampling variation is all that
traditional statistical analysis addresses, and often not very well at that (Gigerenzer
2004). Researchers should be more concerned with developing and using indicators of
the scientific importance of their results, than with how well the results fit to a rather
arbitrary statistical model. For example, they could ask whether what they have found
fits observations elsewhere, can be uncovered using a variety of different methods,
whether it looks right in practice, or what the dangers might be in assuming that it is
true.
This paper illustrates these points - especially the need to be sceptical about results
that depend on only one method - with an important topical example. The ‘raw’
examination scores produced in different schools are not so much a measure of the
impact of the schools as of the ability and outcome scores of their allotted pupils. In
order to decide which schools are making differential progress with their pupils, the
DfES in England is now producing value-added scores for each school. These value-
added scores attempt to measure the differential progress made by strictly equivalent
pupils in different schools.
Methods
In this ‘value-added’ analysis, the prior attainment of each pupil is taken into account,
such that the published official figures reflect not the intake to the school but the
average progress made by pupils while in the school. The DfES value-added scores for
the average pupil progress from Key Stage 1 (KS1, the prior attainment of the pupil
aged 7 at primary school) to Key Stage 2 (attainment at age 11) in each secondary
school are calculated as follows (fuller details are available at DfES 2006).
Most independent schools, infant-only schools, pupil referral units and schools with
less than five pupils in the age group are excluded. Otherwise, for the 2005 figures, all
pupils in an eligible school were included who were eligible for KS2, still on the school
roll in May 2005, and with a matched KS1 score. Each pupil KS1 and KS2 outcome