The name is absent



Stata Technical Bulletin

STB-57


al

a2

bl

b2

D

S

1.

1

2

1

1

O

O

2.

1

2

2

2

O

O

3.

2

2

1

2

1

O

4.

2

2

1

2

O

O

ε.

1

1

1

2

1

O

6.

1

1

1

2

1

1

7.

1

2

2

2

1

O

S.

1

2

1

1

1

O

9.

1

2

1

1

1

O

Each line represents one subject. When D = 1, the subject is a case and when D = 0, the subject is a control. Each locus
contains pairs of alleles, for locus a these are al and a2. For example, subject 1 has alleles 1 and 2 at locus a. If phase is
known, then the ordered genotype would be 1/2.

If phase is known, the association test between one of the loci and the disease status is the chi-squared test of association
in a contingency table. When phase is unknown, the contingency table is not observed, so a model of independence and the
saturated model are compared using the likelihood-ratio test. Using the notation first introduced by Wilkinson and Rogers (1973),
the independence model is 11+D where 11 is the locus and D is the case-control variable and the saturated model is 11*D. The
commands to do this analysis are

. hapipf al a2, ipf(ll*D) model(O)

. hapipf al a2t ipf(ll+D) model(l) Irtest(OtI)

The varlist specifies that the alleles at locus a are used and corresponds to locus 1 in the ipf option.

The test for linkage disequilibrium between two loci is very similar to the test of association between locus and disease
status. The models to compare are 11*12 and 11+12.

. hapipf al a2 bl b2t ipf(ll*12) model(0)

. hapipf al a2 bl b2t ipf(ll÷12) model(l) Irtest(OtI)

Here loci a and b correspond to loci 1 and 2, respectively, in the ipf option.

To obtain the expected haplotype frequencies in the 11*12 model requires the display option.

. hapipf al a2 bl b2t ipf(11*12) display

Haplotype Frequency Estimation by EM algorithm

No. loci            = 2

Log-Likelihood     = -330.3559939995067

Df              =O

No. parameters     = 4

No. cells          = 4

Imputed Frequencies

Haplo

freq

eprob

1.1

20.150157

.06143341

1.2

116.84984

.35624952

2.1

171.84984

.62393246

2.2

19.1ε01ε7

.05838463

Expected Frequencies

Haplo

freq

eprob

1.1

2o.1εoε68

.06143466

1.2

116.84943

.35624828

2.1

171.84943

.52393118

2.2

19.1εoε68

.05838588

The haplotypes are listed under the variable Haplo and loci are separated by a dot. For a saturated model, the imputed and
expected frequencies are the same. For models that are not saturated, the expected frequencies obey the log-linear model. The
expected frequencies can be saved as a Stata datafile by the using option and this datafile can be used for calculating odds
ratios using tabodds.

As with normal case-control studies, there is a possibility that the relationship between haplotype/locus and disease is
confounded by another variable (S). A solution is to perform a stratified analysis using the confounder as the stratifying variable
and assuming a common odds model. To test whether this variable is an effect modifier compare the model 11*12*S*D to
11*12*S+11*12*D+S*D. The second model assumes that the odds ratios are the same between strata.

. hapipf al a2 bl b2, ipf(11*12*S*D) model(O)

. hapipf al a2 bl b2, ipf(11*12*S+11*12*D+S*D) model(l) Irtest(O1I)



More intriguing information

1. The name is absent
2. SOCIOECONOMIC TRENDS CHANGING RURAL AMERICA
3. PERFORMANCE PREMISES FOR HUMAN RESOURCES FROM PUBLIC HEALTH ORGANIZATIONS IN ROMANIA
4. Detecting Multiple Breaks in Financial Market Volatility Dynamics
5. The name is absent
6. Co-ordinating European sectoral policies against the background of European Spatial Development
7. Ruptures in the probability scale. Calculation of ruptures’ values
8. Comparison of Optimal Control Solutions in a Labor Market Model
9. Imitation in location choice
10. The name is absent