Misreporting is prosecuted.
Given that the INPS data set includes a Hscal identifier for the employer which is also
present in the CAD data set, linking the employer’s records to the employees is relatively
straightforward. As in other countries where social security data are available, the Italian
INPS data contain some detailed information on worker compensation but information on
demographics is scant. In particular, the data set reports total earnings and the number of
weeks worked in each year.
Table 1 reports various descriptive statistics for the firms (Panel A) and workers (Panel
B) present in our sample. Panel A shows the main characteristics for the sample of firms
in the CAD data set. From an initial sample of 177,654 firm∕year observations, we end up
with a sample of 116,809, excluding firms with intermittent participation (40,225 observa-
tions) and those with missing values on the variables used in the empirical analysis (20,620
observations).16
The sample ranges from very small firms to firms with almost 180,000 employees, with
an average of 204 and a median of 60. As expected, most of the firms are in the North
(75 percent). As for the distribution by industry, firms in the chemical, metal production
and machinery sectors account for more than 40 percent of the final sample. Firms in more
traditional productions (textile, food, paper) account for almost 25 percent. Construction
and retail trade take another 25 percent. The remaining 10 percent is scattered in the service
sectors, which, with a high share of self-employment and small firms, are under-represented
in the CAD data set.
Panel B reports sample characteristics for the workers in the 1974-1994 INPS sam-
ple. We start with an initial sample of 383,985 worker∕year observations and end up with
186,715. Sample selection was made with the explicit aim of retaining workers with stable
employment and tenure patterns. First we excluded those younger than 18 or older than 65
(5,564 observations), circumventing the problem of modelling human capital accumulation
and retirement decisions. To avoid dealing with wage changes that are due to job termi-
nation (registration or layoffs) or unstable employment patterns, we excluded workers with
10Additional observations are lost (for both firms and workers) in the empirical analysis given the dynamic
nature of most of our estimators.
17