National curriculum assessment: how to make it better



kept primarily for formative purposes, to the summative level that would be reported to
students and their parents. The teacher would be free (indeed would be expected) to
discount evidence related to ‘trick’ questions like the one given above, when arriving at
a level.

The overall profile of levels for a class would be ‘moderated’ by the external tasks and
tests (see below), which would ensure that the levels awarded could not be inflated by
the teacher. There are many ways in which this could be done—the most severe would
be to use the results of the external tasks and tests to define an ‘envelope’ of levels that
the teacher was allowed to award, so that the distribution of the levels in the summative
levels given by the teacher would have to be exactly the same as that for the external
tasks and tests. In addition, in order to check that the teacher’s weighting of various
aspects of the domain was something similar to those intended in the curriculum,
requirements for correlation could be imposed, so that, to some extent at least, those
getting high marks on the tasks and tests would be awarded high levels. However, this
would be a crude measure, and there is no doubt that additional ways would be needed
to detect and, where possible, eliminate the forms of bias noted by Newton (eg over-
emphasis on certain aspects of the domain, and inclusion of construct irrelevant
variance, such as halo effects).

As I note in Wiliam (2000a), care must also be taken to avoid the teacher’s role in
summative assessment driving underground formative evidence (eg when students do
not divulge difficulties to the teacher because they believe it will be ‘held against
them’). Ultimately, this can only be resolved through trust, but it can be ameliorated
through the depersonalisation of the assessment procedure—while the assessment of the
student against the criteria is undertaken by the teacher, it is important that the student
understands that the criteria themselves are not determined by the teacher, but are
external. Although not perfect, the teacher could then still claim to be the student’s ally.

Newton also raises the question of whether teachers’ assessments would, as I have
claimed, be more reliable than those arising from tests. He is right to point out that
continuous assessment over the period of the key stage is not a replication of the final
assessment, and it would, indeed, be invidious if a student’s level were reduced by the
teacher because the last recorded evidence of a particular aspect of the domain dated
from the previous year. However, if we adopt the conceptual framework provided by
generalisability theory (Cronbach, Gleser, Nanda, & Rajaratnam, 1972), then it is a
logical necessity that the degree of unreliability contributed by student-task interactions
will be lower than for a traditional test because there are more tasks, provided, of
course, teachers can apply the correct standards, and base their levels on the ‘latest and
best’ evidence. If we can control the other sources of unreliability (task-rater
interactions, student-rater interactions, etc—see below) then teachers assessments will
be more reliable than tests.

Evaluative assessments

The fundamental feature of the assessment system I propose is that the evaluative
function of the assessment is based on light sampling. The logic of this is
straightforward. In order to avoid the possibility of ‘teaching to the test’ we need to
assess a greater proportion of the domain of interest. More precisely we want to create a
situation in which we are happy for teachers to teach to the test, because the only way to
improve the test score is to improve the performance of students on the whole domain.

Newton seems to suggest that this could be achieved by adding ‘authentic’ elements to
the existing assessment, as happens in some science examinations which involved a



More intriguing information

1. The name is absent
2. A Multimodal Framework for Computer Mediated Learning: The Reshaping of Curriculum Knowledge and Learning
3. Orientation discrimination in WS 2
4. CONSIDERATIONS CONCERNING THE ROLE OF ACCOUNTING AS INFORMATIONAL SYSTEM AND ASSISTANCE OF DECISION
5. The value-added of primary schools: what is it really measuring?
6. The Effects of Reforming the Chinese Dual-Track Price System
7. The name is absent
8. Nurses' retention and hospital characteristics in New South Wales, CHERE Discussion Paper No 52
9. Moffett and rhetoric
10. The name is absent
11. The name is absent
12. Migration and Technological Change in Rural Households: Complements or Substitutes?
13. Structure and objectives of Austria's foreign direct investment in the four adjacent Central and Eastern European countries Hungary, the Czech Republic, Slovenia and Slovakia
14. A Rational Analysis of Alternating Search and Reflection Strategies in Problem Solving
15. Pupils’ attitudes towards art teaching in primary school: an evaluation tool
16. The name is absent
17. Financial Markets and International Risk Sharing
18. The name is absent
19. Wage mobility, Job mobility and Spatial mobility in the Portuguese economy
20. CONSUMER PERCEPTION ON ALTERNATIVE POULTRY