Stata Technical Bulletin
The purpose of the PUMS file is to provide researchers with direct access to household-by-household and person-by-person
data. Individual household and person data are not available in Summary Tape Files, the other major Census product. Individual
Census responses are confidential; thus, these data have been statistically modified to protect the confidentiality of individuals.
They are designed, however, to provide unbiased estimates and to maintain the covariance structure among variables to the extent
possible.
The data are divided into two file types: household records and person records. Data in the person records include
demographic, socio-economic, family, education, and employment characteristics. Household records include such things as
mortgage or rent payment, size and type of dwelling, number and age of all residents in the household, location of the household,
and relationships among household members. The combined file for each state contains over 500 variables. For the state of
Oregon—to choose an example with which we are familiar—more than 140,000 people are represented.
The ability to read PUMS data into Stata generates an unlimited number of potential uses. The data contain information
useful in almost any field, from advertising to demography. By selecting only the variables that are of interest, the researcher
can optimize memory use and eliminate superfluous information. However, due to the size of the file, it is still recommended
that Intercooled Stata be used whenever possible.
Included on the distribution diskette are two versions of a Stata dictionary we created to read 1990 PUMS data into Stata.
One of the dictionaries, pumshl.dct, is used to read household data into Stata. The other dictionary, pumspl.dct, is used
to read person data. In pumshl.dct, person-level variables are commented out in the dictionary header. The reverse is true in
pumspl.dct. On our system, the PUMS data file is stored in the subdirectory d:\pums. You will need to modify the top line
of each dictionary to provide the file location of your PUMS data file. PUMS data files have names of the form pumsaxxx.txt,
where xx = the state initials. For example, the file for Oregon is pumsaxor.txt.
Because the file is divided into two record types and because each record is set up with a hierarchical structure in which
each person record is subordinate to the associated household record, it is necessary to read the household data in separately
from the person data. The file structure is
Record Type
Serial Number
Data
and so on for each household.
HH serial number
HH serial number
HH serial number
Household characteristics
Person 1’s characteristics
Person 2’s characteristics
We reproduce a portion of the dictionary here:
dictionary using d:\pums\pumsaxor.txt
*
* HOUSEHOLD RECORDS
* -Colunm(I) strl rectype %ls
* .column(2) long SerialNo %7f
* .column(9) byte Sample %lf
* -Column(IO)
* -Column(Il)
* -column(13)
* -column(18)
* -column(20)
* -column(24)
* -column(27)
* -column(29)
* -column(33)
* -column(35)
* -column(39)
* -column(41)
* -column(42)
* -column(43)
* -column(44)
* -column(45)
* -column(46)
byte division %lf
byte state %2f
long puma %5f
byte areatype %2f
int msapmsa %4f
int psa %3f
int subsmpl %2f
int houswgt ⅝4f
byte persons %2f
byte gqtype %lf
byte unitsl %2f
byte husflag % If
byte pdsflag % If
byte rooms %lf
byte tenure %lf
byte acreage %lf
byte commuse %lf
Household variables continue
* -column(203) byte amoblhme %lf
* PERSON RECORDS
* -column(9) byte relatl %2f