The name is absent

Stata Technical Bulletin

STB-57

	al	a2	bl	b2	D	S
1.	1	2	1	1	O	O
2.	1	2	2	2	O	O
3.	2	2	1	2	1	O
4.	2	2	1	2	O	O
ε.	1	1	1	2	1	O
6.	1	1	1	2	1	1
7.	1	2	2	2	1	O
S.	1	2	1	1	1	O
9.	1	2	1	1	1	O

Each line represents one subject. When D = 1, the subject is a case and when D = 0, the subject is a control. Each locus
contains pairs of alleles, for locus a these are al and a2. For example, subject 1 has alleles 1 and 2 at locus a. If phase is
known, then the ordered genotype would be 1/2.

If phase is known, the association test between one of the loci and the disease status is the chi-squared test of association
in a contingency table. When phase is unknown, the contingency table is not observed, so a model of independence and the
saturated model are compared using the likelihood-ratio test. Using the notation first introduced by Wilkinson and Rogers (1973),
the independence model is 11+D where 11 is the locus and D is the case-control variable and the saturated model is 11*D. The
commands to do this analysis are

. hapipf al a2, ipf(ll*D) model(O)

. hapipf al a2_t ipf(ll+D) model(l) Irtest(O_tI)

The varlist specifies that the alleles at locus a are used and corresponds to locus 1 in the ipf option.

The test for linkage disequilibrium between two loci is very similar to the test of association between locus and disease
status. The models to compare are 11*12 and 11+12.

. hapipf al a2 bl b2_t ipf(ll*12) model(0)

. hapipf al a2 bl b2_t ipf(ll÷12) model(l) Irtest(O_tI)

Here loci a and b correspond to loci 1 and 2, respectively, in the ipf option.

To obtain the expected haplotype frequencies in the 11*12 model requires the display option.

. hapipf al a2 bl b2_t ipf(11*12) display

Haplotype Frequency Estimation by EM algorithm

No. loci = 2

Log-Likelihood = -330.3559939995067

Df =O

No. parameters = 4

No. cells = 4

Imputed Frequencies

Haplo	freq	eprob
1.1	20.150157	.06143341
1.2	116.84984	.35624952
2.1	171.84984	.62393246
2.2	19.1ε01ε7	.05838463
Expected Frequencies
Haplo	freq	eprob
1.1	2o.1εoε68	.06143466
1.2	116.84943	.35624828
2.1	171.84943	.52393118
2.2	19.1εoε68	.05838588

The haplotypes are listed under the variable Haplo and loci are separated by a dot. For a saturated model, the imputed and
expected frequencies are the same. For models that are not saturated, the expected frequencies obey the log-linear model. The
expected frequencies can be saved as a Stata datafile by the using option and this datafile can be used for calculating odds
ratios using tabodds.

As with normal case-control studies, there is a possibility that the relationship between haplotype/locus and disease is
confounded by another variable (S). A solution is to perform a stratified analysis using the confounder as the stratifying variable
and assuming a common odds model. To test whether this variable is an effect modifier compare the model 11*12*S*D to
11*12*S+11*12*D+S*D. The second model assumes that the odds ratios are the same between strata.

. hapipf al a2 bl b2, ipf(11*12*S*D) model(O)

. hapipf al a2 bl b2, ipf(11*12*S+11*12*D+S*D) model(l) Irtest(O₁I)

More intriguing information

1. The name is absent
2. The name is absent
3. Midwest prospects and the new economy
4. Inhimillinen pääoma ja palkat Suomessa: Paluu perusmalliin
5. Federal Tax-Transfer Policy and Intergovernmental Pre-Commitment
6. EU Preferential Partners in Search of New Policy Strategies for Agriculture: The Case of Citrus Sector in Trinidad and Tobago
7. THE AUTONOMOUS SYSTEMS LABORATORY
8. Cultural Neuroeconomics of Intertemporal Choice
9. The Impact of Financial Openness on Economic Integration: Evidence from the Europe and the Cis
10. From Aurora Borealis to Carpathians. Searching the Road to Regional and Rural Development