between categories of home students. Variation in these decisions over time, or
between analysts would be perfectly proper, yet it makes comparisons between their
results difficult. As with the general population figures, there will be incompleteness
in HE records, and for some years of the data the ‘Individualised’ Student Records are
not actually linked to individuals but to courses, so that a part-time student taking two
courses in two different institutions does not have a unique identifier, and is in danger
of being counted twice (Gorard and Taylor 2001).
Measuring the characteristics of those in HE
The final requirement, before being able to make the relatively simple arithmetic
calculation involved in producing the proportionate representation of social groups in
HE, is in some ways the easiest since it concerns only those in HE. However, it is
worth illustrating some of the difficulties in using the data even for this group to help
readers understand the severe limitations of any analysis of patterns of participation.
There are no ideal datasets for the analysis of patterns of participation in higher
education (HE) in terms of policy changes, or social, economic, or regional
disparities. All existing datasets suffer from one or more defects: they include only
participants, have incomplete coverage, have substantial proportions of missing data
or cases, or are incompatible in range or aggregation with other datasets.
As with the population census, there are cases simply missing from official statistics
on participation in HE, and as with the population census we cannot be entirely sure
how many cases are missing. The UCAS data on applicants to HE has historically
seriously under-represented part-time, mature and distance students. Returns from
each university of the number of students actually in place may give a better
indication of the overall figures but are generally deficient in terms of key background
variables such as ethnicity and occupational class.
A common problem for the relevant large-scale datasets lies in data missing even
from the cases that are known about. For example, many of the variables in the HESA
datasets are compulsory - i.e. some value has to be reported for each student. But this
does not mean that complete data are available for every student. The ‘missing’ data,