involved in the evaluation was relatively small as was the number of lessons
observed. How pupils were paired, for example, is not quite clear and as
several primary school classes were composite classes with children from two
to four different age groups (Low et al., 1996: 59) it is not always Clearwho
exactly did what.
The final report states that there was no common syllabus of vocabulary,
structures and functions between individual schools. This would seem to make
comparisons between cohort groups and control groups difficult as well as
between cohort groups and other cohort groups. 'Achievement' becomes
difficult to measure without clear reference to a particular course Ofteaching
and learning and a definition of what constitutes attainment, achievement,
success orfailure becomes problematic.
Ellis states that 'there is no direct window through which the researcher can
peer to discover what the learner knows':
"When researchers seek to relate instructional treatments to learning
outcomes, they need instruments with which to measure what learning has
taken place. The problem facing the L2 acquisition researcher is really the
same as that facing the language tester - how to provide valid and reliable
measurements of what the learner knows." (Ellis, 1990: 57)
Skehan (1988:16) suggested, for example, that 'the length of an utterance is an
increasingly unreliable indicator of syntactic ∞mplexity' and some of the
difficulties surrounding the evaluation of lexical development have already been
191