whole-school level, which is also right, and it might well be much better, as has been
the case in the US for many years (although is changing rapidly now) for the results to
be reported only at school level. However, even if the current tests were reported only at
group level, and used to define an envelope of levels that the school could award, this
would still create an incentive to narrow the curriculum by teaching only what appears
in the test.
Formative and summative functions of assessment
I agree with Newton that one of the strengths of the current system of assessment in
place in England and Wales is that the teacher is the student’s ally against the external
agencies charged with assessment. This makes for a purity of role for the teacher which
is attractive. The downside of this, however, is that the failure to use the detailed
knowledge that teachers have about their students impoverishes the quality of the
summative assessment (and in particular makes it less reliable and diminishes validity).
In other words, while teachers may not demand to be involved in summative
assessment, good summative assessment demands the involvement of teachers. This is
why I believe that we need to find ways of ameliorating the tensions between formative
and summative functions of assessment.
My problem with traditional tests is not that they necessitate narrow teaching and rote
learning—indeed, our own work with teachers has shown that teachers developing their
formative assessment practices produce improvements in learning even when this
learning is measured with traditional timed tests and examinations (Black, Harrison,
Lee, Marshall, & Wiliam, 2002). Rather, the problem is that such tests do not (or at
least do not appear to) require deep learning. In high-stakes settings, therefore, teachers
may believe that rote-learning provides a short-cut to improved scores. Whether this is
true or not is almost irrelevant—there is evidence to suggest that many teachers believe
that teaching well is incompatible with improving test scores. Furthermore because the
existing tests systematically under-represent the constructs they are purported to assess
they create the possibility of increasing scores by increasing a student’s competence on
only part of the domain.
Incidentally, I have never argued (or believed) that the format of a test item determines
the kind of capability that can be assessed, although measuring ‘higher-order’ skills
would appear to be more difficult with multiple-choice items. For reasons that are not
entirely clear (and probably not rational) there is a deep mistrust of multiple-choice
items in the UK (see Wood, 1991), but in truth, we have never given them a fair trial.
We are happy to expend tens of millions of pounds paying markers to mark open-ended
items (the total annual marking bill in England across national curriculum tests, GCSE
and A-level is around a quarter of a billion pounds), but somehow believe that actually
creating the tests should be relatively cheap.
Nevertheless, there are some important differences between what makes a good test
item for a formative function and for a summative one. For example, asking students to
‘Simplify, if possible, 2a + 5b’ (Brown, Hart, & Kuchemann, 1984) would be regarded
as unfair in a summative test. The expectations of students that one has to ‘do work’ to
get marks in a test might pressure some students (who would otherwise say that this
expression cannot be simplified) into attempting to simplify the expression. These kinds
of ‘trick’ questions are generally regarded as inappropriate for high-stakes tests.
However, such items provide highly useful information for the teacher and so would be
entirely appropriate for a formative purpose. The crucial feature of the system that I
propose is that there is no routine aggregation from the teacher’s day-to-day records,