Stata Technical Bulletin
Figure 2. Schematic illustration of a three-generation family.
Generation 1
Generation 2
Figure 1. Schematic illustration of a two-generation family.
Syntax
simuped2 #Age1 #Std1 #Age2 #Std2 [, r eps(#) saving filename) alle(#) sib(#) ]
simuped3 #Age1 #Std1 #Age2 #Std2 #Age3 #Std3 [. E®ps(#) saving (filename) alle(#) sib(#) si3(#) ]
Description
simuped2 and simuped3 are immediate commands used for generating two- and three-generation family data, respectively.
For each person in a family, the sex is generated by a probability of 0.5. The age of a person is generated according to a normal
distribution, with means #Ageb #Age2, and #Age3 for the first, second, and third generations. The standard deviations of the ages
are given by #smi, #Std2, and #St<β, respectively.
The number of siblings in a generation is a random number, distributed according to a Poisson distribution. The mean sizes
of the siblings in the second- and third-generation are given by sib(#) and si3(#), respectively.
Hardy-Weinberg equilibrium is assumed for the genotypic distribution of people in the first generation (see, for example,
Elandt-Johnson 1971). The allele frequency of a biallelic locus A is given by the argument alIe ##), denoted as p. The frequencies
of genotypes AA, Aa and aa in the first generation are given by p2, 2p(l — p) and (1 — p)2, respectively. The genotype of
a person in the second- and third-generation is generated according to the Mendelian inheritance, that is, a person inherits the
allele A from the father (or mother) with probability 0.5.
The simulated family data are saved in a file specified by savingfilename), and the number of replications is specified by
reps (#).
Options
reps (#) specifies the number of simulated families. The default value is 100.
savingfilename) specifies the file into which the simulated data are saved. The default file name is temp.dta.
allé (#) specifies the allele frequency of a biallelic locus A. The default value is 0.1.
sib(#) specifies the mean number of siblings in the second generation. The default value is 3.
si3(#) specifies the mean number of siblings in the third generation. It is only used in simuped3. The default value is 3.
Remarks
simuped2 and simuped3 simulate the basic quantities of a person, such as age, sex, and genotype, which are useful in
genetic epidemiology research. Further quantities of interest, such as the disease status of a person, can be simulated based
on these basic family data. However, it sometimes requires specific models for the affect of a disease and the model of the
natural mortality of a person. We do not include the disease status in the programs because that would not make simuped2 and
simuped3 of as general use.