34
Stata Technical Bulletin
STB-48
s. |
25-34 |
30+ |
1 |
0 |
9. |
35-44 |
0-9 |
0 |
214 |
io. |
35-44 |
0-9 |
1 |
4 |
11. |
35-44 |
10-19 |
0 |
84 |
12. |
35-44 |
10-19 |
1 |
8 |
13. |
35-44 |
20-29 |
0 |
48 |
14. |
35-44 |
20-29 |
1 |
6 |
ιε. |
35-44 |
30+ |
0 |
34 |
16. |
35-44 |
30+ |
1 |
0 |
In this new dataset, each of the original observations is split into two observations: one for the cases and one for the
controls. Because we did not specify the case() or the weight () option, the default variable names _case and .weight were
used to name the new variables. The .case variable indicates whether the observations are for cases or for controls and the
.weight variable specifies the corresponding number of cases or controls.
This new dataset can be used with any Stata command that allows frequency weights. For example, we could use logistic
to further analyze these data remembering to specify the [fweight=.weight] option.
sg110 Hardy-Weinberg equilibrium test and allele frequency estimation
Mario Cleves, Stata Corporation, [email protected]
Syntax
genhw all! all2 Weeghlp [if exp [in range [, binvar ]
gehhwi ≠j4j4 #Ae geaa [. IanoyPgenotypes) binvar ]
genhw allows fweights.
Description
genhw estimates allele frequencies, genotype frequencies, and disequilibrium coefficients for codominant traits or data of
completely known genotypes, and performs asymptotic Hardy-Weinberg (HW) equilibrium tests. In the case of two alleles, it
also calculates an exact HW significance probability.
genhw expects each observation to contain the values of the two alleles at the locus being examined (all1 and all2). Allele
values can be numeric or string.
gehhwi is the immediate form of gehhw using the genotypic counts on the command line, where #AA, #Aa and #aa are
the counts for the AA, Aa and aa genotypes. Note that this command only works for biallelic loci.
Options
binvar specifies that binomial standard errors be reported for each allele. These standard errors are calculated assuming that the
population is in Hardy-Weinberg equilibrium. By default, standard errors that do not require this assumption are reported.
IabeKgenolypes) specifies labels to be used in the output of the genotype frequency table. This option is only valid for the
immediate form of the command.
Remarks
gehhw estimates allele and genotype frequencies for codominant traits or data where there is no ambiguity regarding
genotypes. It also performs asymptotic tests for Hardy-Weinberg equilibrium and estimates the disequilibrium coefficient (D)
for each heterozygotic genotype in the sample. See Methods and Formulas for details of these calculations.
Example 1: biallelic locus
Sham (1998) presented MN blood group data from a random sample of 747 individuals. We would like to test whether or
not the population is in Hardy-Weinberg equilibrium. We entered these data into a Stata dataset. Here are a few observations:
list in 1/10 |
a2 | |
1. |
M |
M |
2. |
M |
N |
3. |
N |
N |
4. |
M |
M |