TY - JOUR
T1 - Relative efficiency of ambiguous vs. directly measured haplotype frequencies
AU - Schaid, Daniel J.
N1 - Copyright:
Copyright 2008 Elsevier B.V., All rights reserved.
PY - 2002/11
Y1 - 2002/11
N2 - Haplotypes are useful for both fine-mapping of susceptibility loci and evaluation of sequence variation at multiple sites along a chromosome. However, they are difficult to directly measure over long stretches of DNA in diploid organisms. Consequently, multiple genetic markers are typically measured, without linkage phase information, giving rise to a subject's diplotype. From diplotype data, haplotypes are often inferred by pedigree information, or treated as partially missing data when haplotype frequencies are estimated among unrelated subjects. This latter ambiguity can increase the variance of the estimated haplotype frequencies. Douglas et al. ([2001] Nat. Genet. 28:361-364) recently quantified the relative efficiency of estimating haplotype frequencies from the diplotypes of unrelated subjects, relative to directly measured haplotypes via somatic cell hybrids (conversion technology), and demonstrated that unknown linkage phase can lead to a large loss of efficiency. However, their results were based on linkage equilibrium among marker loci, which may not be realistic for closely linked markers. We extend their relative efficiency calculations by several aspects: 1) allowance for linkage disequilbrium (LD) among marker loci; 2) evaluation of different patterns of LD; and 3) evaluation of nuclear families with and without parents. We show that although the loss in efficiency of haplotype frequencies among unrelated subjects decreases as LD increases to its maximum value, the general conclusions of Douglas et al. ([2001] Nat. Genet. 28:361-364) hold true for a variety of LD patterns and magnitudes. However, our results also demonstrate that trios of parents + one child are highly efficient for haplotype frequency estimation, that additional children offer little information, and that siblings without parents can be grossly inefficient.
AB - Haplotypes are useful for both fine-mapping of susceptibility loci and evaluation of sequence variation at multiple sites along a chromosome. However, they are difficult to directly measure over long stretches of DNA in diploid organisms. Consequently, multiple genetic markers are typically measured, without linkage phase information, giving rise to a subject's diplotype. From diplotype data, haplotypes are often inferred by pedigree information, or treated as partially missing data when haplotype frequencies are estimated among unrelated subjects. This latter ambiguity can increase the variance of the estimated haplotype frequencies. Douglas et al. ([2001] Nat. Genet. 28:361-364) recently quantified the relative efficiency of estimating haplotype frequencies from the diplotypes of unrelated subjects, relative to directly measured haplotypes via somatic cell hybrids (conversion technology), and demonstrated that unknown linkage phase can lead to a large loss of efficiency. However, their results were based on linkage equilibrium among marker loci, which may not be realistic for closely linked markers. We extend their relative efficiency calculations by several aspects: 1) allowance for linkage disequilbrium (LD) among marker loci; 2) evaluation of different patterns of LD; and 3) evaluation of nuclear families with and without parents. We show that although the loss in efficiency of haplotype frequencies among unrelated subjects decreases as LD increases to its maximum value, the general conclusions of Douglas et al. ([2001] Nat. Genet. 28:361-364) hold true for a variety of LD patterns and magnitudes. However, our results also demonstrate that trios of parents + one child are highly efficient for haplotype frequency estimation, that additional children offer little information, and that siblings without parents can be grossly inefficient.
KW - EM algorithm
KW - Fisher's information
KW - Linkage disequilibrium
KW - Nuclear family
UR - http://www.scopus.com/inward/record.url?scp=0036858594&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=0036858594&partnerID=8YFLogxK
U2 - 10.1002/gepi.10184
DO - 10.1002/gepi.10184
M3 - Article
C2 - 12432508
AN - SCOPUS:0036858594
SN - 0741-0395
VL - 23
SP - 426
EP - 443
JO - Genetic Epidemiology
JF - Genetic Epidemiology
IS - 4
ER -