Stata Technical Bulletin
STB-57
al |
a2 |
bl |
b2 |
D |
S | |
1. |
1 |
2 |
1 |
1 |
O |
O |
2. |
1 |
2 |
2 |
2 |
O |
O |
3. |
2 |
2 |
1 |
2 |
1 |
O |
4. |
2 |
2 |
1 |
2 |
O |
O |
ε. |
1 |
1 |
1 |
2 |
1 |
O |
6. |
1 |
1 |
1 |
2 |
1 |
1 |
7. |
1 |
2 |
2 |
2 |
1 |
O |
S. |
1 |
2 |
1 |
1 |
1 |
O |
9. |
1 |
2 |
1 |
1 |
1 |
O |
Each line represents one subject. When D = 1, the subject is a case and when D = 0, the subject is a control. Each locus
contains pairs of alleles, for locus a these are al and a2. For example, subject 1 has alleles 1 and 2 at locus a. If phase is
known, then the ordered genotype would be 1/2.
If phase is known, the association test between one of the loci and the disease status is the chi-squared test of association
in a contingency table. When phase is unknown, the contingency table is not observed, so a model of independence and the
saturated model are compared using the likelihood-ratio test. Using the notation first introduced by Wilkinson and Rogers (1973),
the independence model is 11+D where 11 is the locus and D is the case-control variable and the saturated model is 11*D. The
commands to do this analysis are
. hapipf al a2, ipf(ll*D) model(O)
. hapipf al a2t ipf(ll+D) model(l) Irtest(OtI)
The varlist specifies that the alleles at locus a are used and corresponds to locus 1 in the ipf option.
The test for linkage disequilibrium between two loci is very similar to the test of association between locus and disease
status. The models to compare are 11*12 and 11+12.
. hapipf al a2 bl b2t ipf(ll*12) model(0)
. hapipf al a2 bl b2t ipf(ll÷12) model(l) Irtest(OtI)
Here loci a and b correspond to loci 1 and 2, respectively, in the ipf option.
To obtain the expected haplotype frequencies in the 11*12 model requires the display option.
. hapipf al a2 bl b2t ipf(11*12) display
Haplotype Frequency Estimation by EM algorithm
No. loci = 2
Log-Likelihood = -330.3559939995067
Df =O
No. parameters = 4
No. cells = 4
Imputed Frequencies
Haplo |
freq |
eprob |
1.1 |
20.150157 |
.06143341 |
1.2 |
116.84984 |
.35624952 |
2.1 |
171.84984 |
.62393246 |
2.2 |
19.1ε01ε7 |
.05838463 |
Expected Frequencies | ||
Haplo |
freq |
eprob |
1.1 |
2o.1εoε68 |
.06143466 |
1.2 |
116.84943 |
.35624828 |
2.1 |
171.84943 |
.52393118 |
2.2 |
19.1εoε68 |
.05838588 |
The haplotypes are listed under the variable Haplo and loci are separated by a dot. For a saturated model, the imputed and
expected frequencies are the same. For models that are not saturated, the expected frequencies obey the log-linear model. The
expected frequencies can be saved as a Stata datafile by the using option and this datafile can be used for calculating odds
ratios using tabodds.
As with normal case-control studies, there is a possibility that the relationship between haplotype/locus and disease is
confounded by another variable (S). A solution is to perform a stratified analysis using the confounder as the stratifying variable
and assuming a common odds model. To test whether this variable is an effect modifier compare the model 11*12*S*D to
11*12*S+11*12*D+S*D. The second model assumes that the odds ratios are the same between strata.
. hapipf al a2 bl b2, ipf(11*12*S*D) model(O)
. hapipf al a2 bl b2, ipf(11*12*S+11*12*D+S*D) model(l) Irtest(O1I)