score as the conditional probability of receiving the treatment T , given x
p(x) = Pr[T = 1|x] = E[T|x]
For the purposes of the analysis to follow, two key results first introduced by Rosenbaum and
Rubin (1983) are noteworthy:
Lemma 1 (Balance): If p(x) is the propensity score, then x ⊥ T |p(x). Stated differently, the
distribution of the covariates for treatment and control is the same once we condition on the
propensity score: F (x|T = 1, P (x)) = F (x|T = 0, P (x))
Lemma 2 (Ignorability): If there is no omitted variable bias once x is controlled for, then
assignment to treatment is unconfounded given the propensity score.
The first result says that once we condition on the propensity score, assignment to the
treatment group is random. In other words, for two identical propensity scores, there should
be no statistically significant differences in the associated x vector, independent of how these
scores are distributed between the treatment group and the control group. This property must
be met if we are to move forward after computing the propensity score.
The second result says that selection into treatment depends only on what we can observe,
i.e., x. In other words, while the propensity score balances the data (i.e., removes the influence
of the observables on assignment to the treatment group), it also assumes no confounding on
the basis of unobservables. Whether or not this assumption is plausible rests on whether the
specification of the propensity score regression accurately reflects the key factors that might
influence the process of treatment assignment.
A key challenge in getting the right specification for the propensity score is making sure
that the balancing property is satisfied. Practically speaking, the balancing property of the
propensity score implies that we need to make sure that the control group and beneficiary
group are not statistically different from each other, once we’ve conditioned on x. This requires
that we check that E(p(x)|T = 1) = E(p(x)|T = 0) as well as that x ⊥ Ti|p(x). One way to
accomplish this test is to aggregate the estimated propensity score p(x), into mutually exclusive
intervals (blocks) over its distribution and then check that the average propensity score within
each block is uncorrelated with treatment assignment. Then using this same procedure, we can
also check that each covariate is uncorrelated with treatment assignment within each block.
This obviously means that the balancing property can only be tested in a proximate sense.
Dehejia and Wahba (1999, 2002) and the associated STATA implementation of Becker and
Ichino (2002) is one very widely used algorithm for testing that the estimated propensity score
balances the covariates of treatment status.5
4.1 Stratification
If lemma 1 (the balance property) is satisfied, a somewhat natural way to compute the treatment
effect then is to take the difference between the mean outcomes of the treated and control groups
within each stratum of the propensity for which the covariates are balanced, and weight each of
these differences by the distribution of the treated households across the strata in order to get
the average treatment effect for the treated households. Formally, let i denote the ith treated
household; let j denote the jth control household, and let b denote the bth block (stratum).
Then a block-specific treatment effect is
ATTb = (Nb,1)-1 X y1i - (Nb,0)-1 X y0j
i∈I(b) j∈I(b)
5The approach works by arbitrarily grouping the data by blocks (intervals) of the propensity score, where
initially the scores within a block are quite similar. An equality of means test between treatment and control
observations is performed for each of the regressors contained in x. If there are no statistically significant
differences between treatment and control for each of the covariates in the propensity score regression, then the
regressors are balanced. If a particular regressor is unbalanced for a particular block, then that block is split
into further groups and the test is conducted again. This iterative process continues until all the regressors are
balanced or the test fails.