Once this decision has been made, it is important to ensure that the sample size is as large as
possible at the level at which the randomization has occurred. For example, if randomization
has occurred at a group level (e.g. school), it is important to have as large a sample of schools
as possible. It is not the case that increasing the sample size of individuals within a school gives
more power to the evaluation. Rather, at the margin, the evaluator will gain more information
from the addition of a cluster or group (in this case, school) than they will through the addition
of a new individual to an already existing group. This is because individuals within a given
community or school could all negatively (positively) be affected by some shock, with the
consequence that their individual outcomes could be correlated as a result. The addition of
new groups helps to cater for the possibility of intra-group shocks that could affect a number
of individuals in a significant manner.
Randomization bias may also plague impact assessment estimates (Heckman and Smith,
1995). This arises if there is a significant difference in the kinds of individuals who would choose
to participate in a programme compared to those individuals who are randomly assigned to
participate in a programme. Consequently, the intervention that is evaluated is different than
the intervention that is implemented in practice, making it difficult to know what to make of
the estimates (Ravallion, 2008).
Finally, randomized evaluations may confront ethical objections that the method of ran-
domization by its very nature will exclude some individuals that could potentially benefit from
the intervention, and will include some individuals in the treatment group that do not need the
intervention as much. These ob jections may be combined with political concerns over service
delivery to the electorate. While ethical objections should be addressed, the short-term loss of
being excluded from the benefits of an intervention may be small in relation to the long-term
benefits once a programme that has been properly evaluated is implemented and scaled up
(Ravallion, 2008). Moreover, randomization may be the fairest method of allocating scarce
resources, when it is simply not possible to deliver a programme to everyone. For example, the
PROGRESSA programme, launched in 1998, provided social grants to households conditional
on the enrollment and attendance of children at school, and their participation in preventative
health care programmes. Since budget constraints made it impossible to reach all of the 50 000
potential beneficiary communities, the Mexican government made a conscious choice to begin
with a pilot project of 506 communities, of which, half were randomly selected to receive the
grants while the others did not (Gertler and Boyce, 2001). The project was later scaled up
considerably.
4 Propensity Score Matching
When randomization is not practically or politically feasible, or when the results from a ran-
domized intervention are not internally valid, more appropriate counterfactuals can be found
by matching treatment households to control households. The ideal approach is to match
treated household to control households directly on their characteristics (see for example An-
grist (1998)) but this approach is often not practical when some of the more important variables
we wish to condition on are continuous, or when the number of covariates we wish to match on
is of large dimension.
Propensity score matching is a useful alternative to exact matching. The idea here is to
match not on the multidimensional vector of covariates but rather on a scalar index (propensity
score) of predicted probabilities computed from a regression where the outcome variable is a
binary indicator of treatment (see Rosenbaum and Rubin, 1983; Heckman and Robb, 1985;
Heckman, LaLonde and Smith, 1999).4
Formally, if we let x be a vector of pre-treatment variables, then we can define the propensity
4 Hirano and Imbens (2004) provide a generalization of this approach to the case where treatment is not
binary but continuous. This approach is potentially quite useful for many health interventions where one would
be interested in not only the effect of treatment but the dosage of treatment among the treated (e.g., ARV
treatment).