Stata Technical Bulletin
19
Methods and formulas
Here we only give the formulas for testing whether the cases are under HWE, given the controls are under HWE, as the
methods for testing one sample has been given by Cleves (1999). The standard error of the disequilibrium coefficient (D) was
not included in the command genhwi, but it is included in the new command genhwcci. Details of the formula can be found
in Weir (1990, 74).
The observed case-control data is shown in the following table, where щ and n'i represent the number of genotypes among
cases and controls, respectively, i = AA, AL!, BB. Let τr, and τr∙ represent the probability that a person has genotype a among
cases and controls, respectively. We have ∑j 7L = 1 and ɪʃ, τr ∙ = 1
Table 1: Observed genotypic counts
Genotype |
Case |
Control |
Total |
AA |
паа |
nAA |
mAA |
AB |
пав |
nAB |
тлв |
BB |
nBB |
nBB |
твв |
Total |
n |
n' |
m |
Suppose the controls are randomly selected from the population of interest, which is under HWE. Then the distribution of
genotypes in the population is given by
tγ'aa = /, tγ'ab = 2W, 7γbb = Q2 (1)
where p is the allele frequency of A in the population, and q = 1 — p.
Under the null hypothesis, that is, the cases are under HWE, the genotype distribution of the cases is also given by (1) as
the controls are assumed to be under HWE. Then the log-likelihood function is given by
L0 = (пал + n'AA) lθg(p2) + {∏ab + n'AB) ∖og(2pq) + (nss + n'ss) log(g2)
with one parameter p.
Under the alternative hypothesis, that is, cases are not under HWE, the genotype distribution of the cases are kaa, nAB
and tγbb, then the log-likelihood function is given by
Li = ПАА log 7Γ.4.4 + nAB log 7Γ.4B + «ВВ lθg 7Γββ + ∏'aa log(p2) + n'AB log(2pg) + n'Bb log(g2)
with the three parameters p, -∏aa and tγj4b, where the τr, sum to one.
To test the null hypothesis versus the alternative hypothesis, we use the statistic — 2{L0 — Lwh which approximates the χ2
distribution with 2 degrees of freedom. The maximum likelihood estimates of p and τr, can be obtained from the score functions
of L0 and Li, respectively. More specifically, under the null hypothesis, p = 22.mAA + mAB)/22.m), while under the alternative
hypothesis, p = (2n'44 + n'4B)/(2n'), 7rj4.4 = nAA/n, ttab = nj4β∕∏, and ttbb = пвв/п. Then — 2(L0 — Z1) is evaluated
at the above maximum likelihood estimates and compared with the χ2 distribution.
Acknowledgment
I would like to thank Dr. Douglas Easton of the University of Cambridge for helpful discussions during his visit to Australia
in November of 1999.
References
Cleves, M. 1999. sg110: Hardy-Weinberg equilibrium test and allele frequency estimation. Stata Technical Bulletin 48: 34-37. Reprinted in Stata
Technical Bulletin Reprints, vol. 8, pp. 280-284.
Helzlsouer, K. J. et al. 1998. Association between CYP17 polymorphisms and the development of breast cancer. Cancer Epidemiology, Biomarkers &
Prevention 7: 945-949.
Weir, B. S. 1990. Genetic Data Analysis. Sunderland, Massachusetts: Sinauer Associates.