Stata Technical Bulletin
37
Methods and formulas
Borrowing the notation from Weir (1996), let Au, и = {1,..., ∕√} represent к alleles at a locus and AuAv represent each
of the possible k(k + 1)/2 distinct genotypes.
Consider a random sample of n individuals. Then the observed alleles counts, nu, are
nu = 2nuu + nuv
u≠v
where nuv and nuu are respectively, the observed number of heterozygotes AuAv and homozygotes AuAu in the sample.
The population allele frequencies are therefore estimated as
nu
Pu =
2n
and their variances as
var(⅛) = P(j>u + Puu - 2p2u)
δu
where Puu is the observed frequency of the AuAu genotype.
Each allele variance under Hardy-Weinberg equilibrium simplifies to the variance of a binomial distribution with parameters
ρu and 2n:
var(⅛) = -^~Pu^-Pu")
2n
The expected genotype frequencies under the assumption of Hardy-Weinberg equilibrium are estimated as
for homozygotes, and
E(Puu)=Pu
E(Puv) = 2pupv (u v)
for heterozygotes.
The disequilibrium coefficients for heterozygous genotypes are estimated as
Duv = pupv - -Puv
Cv V ɪ Cvx V Cv V
The Pearson’s chi-squared test statistic is computed using the observed and expected genotype counts as
(⅛ - n⅜)2
y' (jiuv - 2npupv)2
2nPuPv
ιt≠∙υ
and the likelihood-ratio chi squared test statistic as
^21n(⅛)
where
⅛ = Σ'-n(⅛)2÷ΣΣ-n(⅞⅛)
n v z U U≠V v z
and
iι=∑ """ln (⅛) ` ∑ ∑ """'n (⅛)
U ' U u≠v ”■ ≠
Both Pearson’s and the likelihood-ratio chi-squared test statistics are distributed with k(k — 1)/2 degrees of freedom.
References
Sham, P. 1998. Statistics in Human Genetics. New York: John Wiley & Sons.
Spencer, N., D. A. Hopkinson, and H. Harris. 1964. Quantitative differences and gene dosage in the human red cell acid phosphatase polymorphism.
Nature 201: 299-300.
Weir, B. S. 1996. Genetic Data Analysis II. Sunderland, MA: Sinauer Associates.