Power comparisons between similarity-based multilocus association methods, logistics regression, and score tests for haplotypes

Wan Yu Lin, Daniel J Schaid

Research output: Contribution to journalArticle

23 Citations (Scopus)

Abstract

Recently, a genomic distance-based regression for multilocus associations was proposed (Wessel and Schork [2006] Am. J. Hum. Genet. 79:792-806) in which either locus or haplotype scoring can be used to measure genetic distance. Although it allows various measures of genomic similarity and simultaneous analyses of multiple phenotypes, its power relative to other methods for case-control analyses is not well known. We compare the power of traditional methods with this new distance-based approach, for both locus-scoring and haplotype-scoring strategies. We discuss the relative power of these association methods with respect to five properties: (1) the marker informativity; (2) the number of markers; (3) the causal allele frequency; (4) the preponderance of the most common high-risk haplotype; (5) the correlation between the causal single-nucleotide polymorphism (SNP) and its flanking markers. We found that locus-based logistic regression and the global score test for haplotypes suffered from power loss when many markers were included in the analyses, due to many degrees of freedom. In contrast, the distance-based approach was not as vulnerable to more markers or more haplotypes. A genotype counting measure was more sensitive to the marker informativity and the correlation between the causal SNP and its flanking markers. After examining the impact of the five properties on power, we found that on average, the genomic distance-based regression that uses a matching measure for diplotypes was the most powerful and robust method among the seven methods we compared.

Original languageEnglish (US)
Pages (from-to)183-197
Number of pages15
JournalGenetic Epidemiology
Volume33
Issue number3
DOIs
StatePublished - 2009

Fingerprint

Haplotypes
Logistic Models
Single Nucleotide Polymorphism
Viverridae
Gene Frequency
Genotype
Phenotype

Keywords

  • Canonical correlation
  • Genomic distance
  • Genotyping errors
  • Linkage disequilibrium

ASJC Scopus subject areas

  • Genetics(clinical)
  • Epidemiology

Cite this

@article{d3bedeed3c1145a0bd0b0658e95c8e19,
title = "Power comparisons between similarity-based multilocus association methods, logistics regression, and score tests for haplotypes",
abstract = "Recently, a genomic distance-based regression for multilocus associations was proposed (Wessel and Schork [2006] Am. J. Hum. Genet. 79:792-806) in which either locus or haplotype scoring can be used to measure genetic distance. Although it allows various measures of genomic similarity and simultaneous analyses of multiple phenotypes, its power relative to other methods for case-control analyses is not well known. We compare the power of traditional methods with this new distance-based approach, for both locus-scoring and haplotype-scoring strategies. We discuss the relative power of these association methods with respect to five properties: (1) the marker informativity; (2) the number of markers; (3) the causal allele frequency; (4) the preponderance of the most common high-risk haplotype; (5) the correlation between the causal single-nucleotide polymorphism (SNP) and its flanking markers. We found that locus-based logistic regression and the global score test for haplotypes suffered from power loss when many markers were included in the analyses, due to many degrees of freedom. In contrast, the distance-based approach was not as vulnerable to more markers or more haplotypes. A genotype counting measure was more sensitive to the marker informativity and the correlation between the causal SNP and its flanking markers. After examining the impact of the five properties on power, we found that on average, the genomic distance-based regression that uses a matching measure for diplotypes was the most powerful and robust method among the seven methods we compared.",
keywords = "Canonical correlation, Genomic distance, Genotyping errors, Linkage disequilibrium",
author = "Lin, {Wan Yu} and Schaid, {Daniel J}",
year = "2009",
doi = "10.1002/gepi.20364",
language = "English (US)",
volume = "33",
pages = "183--197",
journal = "Genetic Epidemiology",
issn = "0741-0395",
publisher = "Wiley-Liss Inc.",
number = "3",

}

TY - JOUR

T1 - Power comparisons between similarity-based multilocus association methods, logistics regression, and score tests for haplotypes

AU - Lin, Wan Yu

AU - Schaid, Daniel J

PY - 2009

Y1 - 2009

N2 - Recently, a genomic distance-based regression for multilocus associations was proposed (Wessel and Schork [2006] Am. J. Hum. Genet. 79:792-806) in which either locus or haplotype scoring can be used to measure genetic distance. Although it allows various measures of genomic similarity and simultaneous analyses of multiple phenotypes, its power relative to other methods for case-control analyses is not well known. We compare the power of traditional methods with this new distance-based approach, for both locus-scoring and haplotype-scoring strategies. We discuss the relative power of these association methods with respect to five properties: (1) the marker informativity; (2) the number of markers; (3) the causal allele frequency; (4) the preponderance of the most common high-risk haplotype; (5) the correlation between the causal single-nucleotide polymorphism (SNP) and its flanking markers. We found that locus-based logistic regression and the global score test for haplotypes suffered from power loss when many markers were included in the analyses, due to many degrees of freedom. In contrast, the distance-based approach was not as vulnerable to more markers or more haplotypes. A genotype counting measure was more sensitive to the marker informativity and the correlation between the causal SNP and its flanking markers. After examining the impact of the five properties on power, we found that on average, the genomic distance-based regression that uses a matching measure for diplotypes was the most powerful and robust method among the seven methods we compared.

AB - Recently, a genomic distance-based regression for multilocus associations was proposed (Wessel and Schork [2006] Am. J. Hum. Genet. 79:792-806) in which either locus or haplotype scoring can be used to measure genetic distance. Although it allows various measures of genomic similarity and simultaneous analyses of multiple phenotypes, its power relative to other methods for case-control analyses is not well known. We compare the power of traditional methods with this new distance-based approach, for both locus-scoring and haplotype-scoring strategies. We discuss the relative power of these association methods with respect to five properties: (1) the marker informativity; (2) the number of markers; (3) the causal allele frequency; (4) the preponderance of the most common high-risk haplotype; (5) the correlation between the causal single-nucleotide polymorphism (SNP) and its flanking markers. We found that locus-based logistic regression and the global score test for haplotypes suffered from power loss when many markers were included in the analyses, due to many degrees of freedom. In contrast, the distance-based approach was not as vulnerable to more markers or more haplotypes. A genotype counting measure was more sensitive to the marker informativity and the correlation between the causal SNP and its flanking markers. After examining the impact of the five properties on power, we found that on average, the genomic distance-based regression that uses a matching measure for diplotypes was the most powerful and robust method among the seven methods we compared.

KW - Canonical correlation

KW - Genomic distance

KW - Genotyping errors

KW - Linkage disequilibrium

UR - http://www.scopus.com/inward/record.url?scp=66249144684&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=66249144684&partnerID=8YFLogxK

U2 - 10.1002/gepi.20364

DO - 10.1002/gepi.20364

M3 - Article

VL - 33

SP - 183

EP - 197

JO - Genetic Epidemiology

JF - Genetic Epidemiology

SN - 0741-0395

IS - 3

ER -