The name is absent



Stata Technical Bulletin

sbe38 Haplotype frequency estimation using an EM algorithm and log-linear modeling

Adrian Mander, MRC Biostatistics Unit, Cambridge, UK, [email protected]

Abstract: This function estimates allele/haplotype frequencies under a log-linear model when phase is unknown. Different
log-linear models are compared using a likelihood-ratio test allowing tests for linkage disequilibrium and disease association.
These tests can be adjusted for possible confounders in a stratified analysis.

Keywords: Haplotypes, alleles, association studies, stratified analysis, phase unknown, log-linear modeling.

Syntax

hapipf varlist [using exp] [if exp] [, ldim(αrrlist) display ipf (str) start known

phaseVaraaame) acc(#) ipfacc(#) nolog model(#) Irtest(#,#)

convars (str) conf ile (str) ]

Description

This function calculates allele/haplotype frequencies using log-linear modeling embedded within an EM algorithm. The
EM algorithm handles the phase uncertainty and the log-linear modeling allows testing for linkage disequilibrium and disease
association. These tests can be controlled for confounders using a stratified analysis specified by the log-linear model. The
log-linear model can also model the relationship between loci and hence can group similar haplotypes.

The log-linear model is fitted using iterative proportional fitting which is implemented in the ipf command introduced in
Mander (2000). Note that before hapipf can execute, the ipf command must be installed. This algorithm can handle very large
contingency tables and converges to maximum likelihood estimates even when the likelihood is badly behaved.

The aarlist consists of paired variables representing the alleles at each locus. If phase is known, then the pairs are the
genotypes. When phase is unknown the algorithm assumes Hardy-Weinberg Equilibrium, so models are based on chromosomal
data and not genotypic data.

Options

Idim Vanι^list) specifies the variables that determine the dimension of the contingency table. By default the variables contained
in the ipf option define the dimension.

display specifies whether the expected and imputed haplotype frequencies are shown on the screen.

ipf (str) specifies the log-linear model. It requires special syntax of the form 11*12+13. This model makes the third locus
independent of the first two and includes the interaction between the first and second locus.

start specifies that the starting posterior weights of the EM algorithm are chosen at random.

known specifies that phase is known.

phase Vanraame) specifies a variable that contains 1’s where phase is known and 0’s where phase is unknown.

acc(#) specifies the convergence criteria based on the log likelihood.

ipfacc(#) specifies the convergence criteria for the iterative proportional fitting algorithm.

nolog specifies whether the log likelihood is displayed at each iteration.

model (#) specifies a label for the log-linear model being fitted. This label is used in the Irtest option.

Irtest (#,#) performs a likelihood-ratio test using two models that have been labeled by the model option.

convars (str) specifies a list of variables in the constraints file.

confile (str) specifies the name of the constraints file.

Examples

Data are taken from Sham (1998) that consist of two loci (a and b), case-control status (D) and one stratifying variable
(S). The first few lines of this dataset are shown below.



More intriguing information

1. The name is absent
2. Announcement effects of convertible bond loans versus warrant-bond loans: An empirical analysis for the Dutch market
3. WP 36 - Women's Preferences or Delineated Policies? The development or part-time work in the Netherlands, Germany and the United Kingdom
4. Performance - Complexity Comparison of Receivers for a LTE MIMO–OFDM System
5. The name is absent
6. Skill and work experience in the European knowledge economy
7. The name is absent
8. The name is absent
9. Using Surveys Effectively: What are Impact Surveys?
10. Policy Formulation, Implementation and Feedback in EU Merger Control
11. Retirement and the Poverty of the Elderly in Portugal
12. An Efficient Secure Multimodal Biometric Fusion Using Palmprint and Face Image
13. Smith and Rawls Share a Room
14. Opciones de política económica en el Perú 2011-2015
15. Effects of red light and loud noise on the rate at which monkeys sample the sensory environment
16. Olfactory Neuroblastoma: Diagnostic Difficulty
17. Dynamiques des Entreprises Agroalimentaires (EAA) du Languedoc-Roussillon : évolutions 1998-2003. Programme de recherche PSDR 2001-2006 financé par l'Inra et la Région Languedoc-Roussillon
18. Transport system as an element of sustainable economic growth in the tourist region
19. DISCRIMINATORY APPROACH TO AUDITORY STIMULI IN GUINEA FOWL (NUMIDA MELEAGRIS) AFTER HYPERSTRIATAL∕HIPPOCAMP- AL BRAIN DAMAGE
20. An Interview with Thomas J. Sargent