A Appendix: The data
A.l The INPS data set
The Italian National Institute for Social Security (Istituto Nazionale della Previdenza So-
ciale) requires firms to file a yearly report (form OlM) for each worker on the payroll. The
data are used to estimate the amount of withholding tax the employer has to pay on behalf
of the employees, and to INPS as contributions towards health insurance and pension funds.
This database covers the universe of employees in the private sector (thus excluding the
self-employed, public employees, and ofl-the books work). Our data set is a sub-sample of the
universe, based on workers born on two particular days of the year; our data refer to 1974-
1994. The form reports information on annual earnings and on the number of weeks worked.
Earnings are divided into two components: normal and occasional. Occasional earnings
includes sums drawn from the wage supplementation fund laid-off or short-time workers,
seniority and loyalty premia, one-time bonuses, moving expenses and business travel refunds,
the monetary value of goods in kind, and allowances for lost tips and commissions. On
average, occasional earnings are less than 10 percent of the total. Our measure of gross
income is the sum of the two components.
The data set also has information on job categories, albeit workers with a rough break-
down: apprentices, production workers, clericals and mangers. Unfortunately, information
on education is missing. From the worker’s social security number it is possible to retrieve
the gender, the year of birth (and therefore age), and place of birth. Finally, the data set
also contains the employer tax code, which allows us to match information on the worker
with that for the firm.
A.2 The CAD data set
Firm data are drawn from the archives of the Italian Company Accounts Data Service,
which collects balance sheet information and other items on over 30,000 Italian firms. The
data, available since 1982 and up to 1996, are gathered by Centrale dei Bilanci, an orga-
nization established in the early 1980s jointly by the Bank of Italy, the Italian Banking
Association (ABI), and a pool of leading banks to build up and share information on bor-
rowers. Besides reporting balance sheet items, the database contains detailed information
on firm demographics (year of foundation, location, type of organization, ownership status,
structure of control, group membership etc.), on employment, and on flow of funds. Balance
sheets are reclassified to reduce dependence on the accounting conventions. Balance sheets
for the banks’ major clients (defined according to the level of borrowing) are collected by
the banks. The focus on the level of borrowing skews the sample towards larger firms.
Furthermore, because most of the leading banks are in the northern part of the country,
the sample has more firms headquartered in the North than in the South. Finally, since
banks mainly deal with firms that are creditworthy, firms in default are not in the data set,
so that the sample is also tilted towards better than average quality borrowers. Despite
these biases, comparison between sample and population moments (not reported) suggests
that the CAD is not too far from being representative of the whole population (with the
exception of the over-representation of firms larger than 1,000 employees).
39