5 The data
We rely on two administrative data sets, one for firms and one for workers. Data for
firms are obtained from Centrale dei Bilanci (Company Accounts Data Service, or CAD
for brevity), while those for workers are supplied by Istituto Nazionale della Preuidenza
Sociale (National Institute for Social Security, or INPS for brevity). Since for each worker
we can identify the firm, we combine the two data sets and use them in a matched employer-
employee framework.14 There is a burgeoning empirical literature on the use of matched
employer-employee data sets (see Hamermesh, 2000, for an account).
The CAD data span from 1982 to 1994, i.e. a period that comprises two complete busi-
ness cycles, with detailed information on a large number of balance sheet items together
with a full description of firm characteristics (location, year of foundation, sector of oper-
ation, ownership structure), plus other variables of economic interest usually not included
in balance sheets, such as employment and flow of funds. Balance sheets are collected for
approximately 30,000 firms per year by Centrale dei Bilanci, an organization established in
the early 1980s jointly by the Bank of Italy, the Italian Banking Association, and a pool of
leading banks to gather and share information on borrowers. Since the banks rely heavily
on it in granting and pricing loans to firms, the data are subject to extensive quality controls
by a pool of professionals, ensuring that measurement error should be negligible.
INPS provides us with data for the entire population of workers registered with the
social security system whose birthday falls on one of two randomly chosen days of the year.
Data are available on a continuous basis from 1974 to 1994. The INPS lacks information on
self-employment and on public employment, which is also excluded from the CAD. As we
describe in Appendix A, the INPS data set derives from forms filled out by the employer
that are roughly comparable to those collected by the Internal Revenue Service in the US.15
14The INPS data set has been used by Casavola, Cipollone and Sestito (1999) to describe the determinants
of pay in the Italian labor markets and by Galizzi and Lang (1998) to test whether quitting patterns depend
on outside employment opportunities. The CAD data set has been used by Guiso and Schivardi (1999) to
explore the impact of information spillovers on firms’ behavior. To our knowledge, the two data sets have
not been used jointly.
laWhile the US administrative data are usually provided on a grouped basis, INPS has truly individual
records. Moreover, in the US earnings records are censored at the top of the tax bracket, while the Italian
data set is not subject to top-coding.
16