change in output measured by the total number of completed tables in each session. Table 1 pro-
vides descriptive statistics and Figure 1 displays the kernel density estimates of the distribution of
productivity changes across treatments. The first thing to notice is that in all treatments it appears
that there is an increase in average productivity between the first and the second session. Despite
a relatively simple task, it is likely that some learning is taking place: in the second session stu-
dents are more familiar with the environment and the requirements of the job. This underlines the
importance of having a baseline treatment to control for all factors affecting productivity changes
between the two sessions other than compensation. Moreover, subjects in the treatment groups
appear to raise productivity by more (15%) than those in the baseline (12%). We formally assess
whether this difference is statistically significant using a one-tailed Mann-Whitney (M-W) test and
a one-tailed Kolmogorov-Smirnov (K-S) test (see Table 2).12 The difference between control and
treatment A and the difference between control and treatment B is statistically significant at the
p < 0.1 level, in most of the cases, while the difference between treatment A and B is insignificant.
The data in Table 1 also suggest that there are significant gender differences in the treatment
effect.13 This is also evident when one inspects the distribution of productivity changes for each
gender separately (Figures 2 and 3). In particular, women appear to be more responsive to the
treatment conditions, as they raise productivity by 21% in treatment A and 19% in treatment B
as compared to 12% for women in the baseline condition and these differences between treatments
and the baseline are significant at the p < 0.05 level, while the difference between treatment A
and B is insignificant. On the other hand, for men each comparison yields insignificant differences.
Notice that for the control group the average values for the level of productivity and its change
across sessions are identical between men and women. This suggests that the differential response
between genders is not due to different learning across genders, but rather due to the treatments.
We also tested whether the distributions differ significantly across gender. For the control this is
not the case (M-W two-tailed p-value = 0.967, K-S two-tailed p-value = 1.000). This confirms that
there is no gender-specific element in the task in itself. On the other hand, the distribution for the
treatment groups are significantly different across gender (M-W two-tailed p-value = 0.012, K-S
two-tailed p-value = 0.085), confirming a differential response to treatments.14
12The alternative hypothesis for the tests presented in table 2 is that average productivity in the treatment condition
is greater than that in the baseline.
13No differences have emerged with regards to other dimensions for which there is enough variation in the data,
e.g. previous work experience, occupational expectations (for profit vs non-profit sector), course of study (natural
sciences and engineering vs social sciences and education), donation to charity in the last 12 months, volunteering
activity in the last 12 months.
14In light of the previous result that the two treatments are not significantly different, we pooled treatments A and
B to improve the power of the test.