No assessment system can do everything, and therefore it is futile to ask “Is this system
perfect?” The answer will always be no. What we can, and should, ask, is “Are the
trade-offs between reliability, validity and manageability that we have settled upon the
right ones?” In particular, we should ask whether the system that we currently have is
the only way of satisfying the design requirements.
The current system is transparent in that there is an apparently objective relationship
between the scores that students get on tests, the levels they are awarded as a result, and
the scores of schools. The position of a school in a performance table is directly
determined by aggregating the marks achieved by its pupils in tests. The question is then
what are the trade-offs in the current system, and whether there are other ways of
achieving the same ends with fewer adverse consequences. While there is no conclusive
system-wide research to demonstrate that the adverse consequences of the current
system are serious and far-ranging, I would maintain that there is enough evidence to
suggest that something is seriously wrong with the current system. In doing this, I am
arguing, along with Toulmin (2001) that in the absence of reliable knowledge about a
particular issue, we sometimes have to rely on what appears to be reasonable. In the
case of national curriculum assessment, while it may not be possible to demonstrate the
adverse consequences ‘beyond a reasonable doubt’, I believe the case is established ‘on
the balance of probabilities’, particularly in terms of curricular distortion and the
consistency of levels attributed to individual pupils.
In outlining an alternative model of national curriculum assessment my concern has
been to attempt to work towards a system of assessment that delivers the same outputs
as the current system—measures of the achievement of individual students, together
with evaluative information on schools—but with fewer adverse consequences.
The light sampling approach that I have outlined certainly could be expected to reduce
the incentives to teachers for narrowing the curriculum, and may increase the reliability
of the assessments made of individual pupils, although, as Newton notes, these are
empirical questions which could be settled by undertaking further research. The trade
off would be a lack of transparency in that the levels awarded to pupils would derive
from integrative judgements by their teachers, rather than by the aggregation of marks.
Ultimately, the differences between Newton and myself seem to me to be mostly about
the burden of proof. He regards the arguments I have advanced regarding both the
deficiencies of the current system, and the strengths of my proposed alternative, as ‘not
proven’. In the final sections of his paper, he lays out the essential elements of a
research agenda, both into the adequacy of the current system, and of the alternatives.
This is a very helpful contribution, and the challenge to research the researchable
questions needs to be taken up. But at the same time, I think it is fair to ask whether we
must we wait until all the evidence is in before things change. There is always the
danger of making things worse, captured in the old adage that we cannot countenance
change—things are bad enough as they are! The challenge for the educational research
community is to provide policy-relevant findings when we cannot be certain about what
to do. I welcome Newton’s contribution, not least in forcing me to clarify and develop
my own thinking, and hope that others, too will join in this debate.
References
Black, P., Harrison, C., Lee, C., Marshall, B., & Wiliam, D. (2002). Working inside the
black box: assessment for learning in the classroom. London, UK: King’s College
London Department of Education and Professional Studies.