Stata Technical Bulletin
STB-22
*
*
*
*
*
*
*
*
*
*
*
*
*
*
*
,Column(Il)
,column(12)
,column(15)
,column(17)
,column(18)
,colυmn(26)
,colυmn(29)
,colυmn(31)
,colυmn(32)
,colυmn(33)
,column(34)
,column(35)
,column(36)
,column(37)
,column(38)
byte sex %lf
int race %3f
byte age %2f
byte marital %lf
int pwgtl %4f
int remplpar %3f
byte rpob %2f
byte rspouse %lf
byte rownchld ⅛lf
byte ragechld %lf
byte rrlchld2 ⅛lf
byte relat2 % If
byte subfam2 % If
byte subfaml %lf
int hispanic %3f
Person variables continue
* -Cθlumn(230) byte aincome7 %lf
* -Cθlumn(231) byte aincome8 %lf
end
As we noted above, to read in the variables for households, first comment out all of the person variables (the second group)
and any unwanted household variables. When reading in the data, use ‘if rectype== "H"’ to read only household variables and
‘if rectype=="P"’ to read only person variables. To allow all of the variables to be read in, use the set maxvar command
to expand the memory space allocated to variables. The following commands sketch the steps for reading the household data.
. clear
. * Reduce maxvar to make space for more observations.
. set maxvar 80
. * Bring the household data into Stata using the infile command.
. infile using pums.dct if rectype==,,H,'
. * Sort the file by the variable SerialNo.
. sort SerialNo
. * After this has been completed, save the household data.
. save hh.dta
Repeat this procedure for the person data, commenting out all of the household variables and unwanted person variables. Again,
you must also read in rectype in order to distinguish between the record types.
. * Bring the person data into Stata using the infile command.
. infile using pums.dct if rectype==,,P,'
. * Sort the file by the variable SerialNo.
. sort SerialNo
. * Save the person data.
. save p.dta
To merge the two files, you must read in the variable SerialNo. This is a unique identifier that links all of the person
records with their associated household record. If you do not wish to merge the person and household records, you do not need
to read SerialNo.
Finally to link the two files, use the merge command.
. use hh.dta
. merge SerialNo using p.dta
This sequence of commands will create one complete file which includes all of the variables you have specified. Tabulate
jɪɪerge to ensure that all records have been properly merged and the data are ready to be used. A jɪɪerge value of 1 or 2
does not necessarily mean an incorrect merge; for example, in the Oregon file, approximately 6,000 household records have
no corresponding person records (and contain virtually all missing values). These records probably correspond to individuals or
households who refused to respond to the census taker. In these cases, the census taker is instructed to obtain as much information
about the household as possible from neighbors or by observation.
In order to make your analyses applicable at the state level, you will need to use the weighting variable provided in the
PUMS file. For analyses based on persons, use the weighting variable pwgt. For analyses based on households, use the weighting
variable houswgt.