Detecting genomic clustering of risk variants from sequence data: Cases versus controls

Daniel J Schaid, Jason P. Sinnwell, Shannon K. McDonnell, Stephen N Thibodeau

Research output: Contribution to journalArticle

11 Citations (Scopus)

Abstract

As the ability to measure dense genetic markers approaches the limit of the DNA sequence itself, taking advantage of possible clustering of genetic variants in, and around, a gene would benefit genetic association analyses, and likely provide biological insights. The greatest benefit might be realized when multiple rare variants cluster in a functional region. Several statistical tests have been developed, one of which is based on the popular Kulldorff scan statistic for spatial clustering of disease. We extended another popular spatial clustering method - Tango's statistic - to genomic sequence data. An advantage of Tango's method is that it is rapid to compute, and when single test statistic is computed, its distribution is well approximated by a scaled χ 2 distribution, making computation of p values very rapid. We compared the Type-I error rates and power of several clustering statistics, as well as the omnibus sequence kernel association test. Although our version of Tango's statistic, which we call "Kernel Distance" statistic, took approximately half the time to compute than the Kulldorff scan statistic, it had slightly less power than the scan statistic. Our results showed that the Ionita-Laza version of Kulldorff's scan statistic had the greatest power over a range of clustering scenarios.

Original languageEnglish (US)
Pages (from-to)1301-1309
Number of pages9
JournalHuman Genetics
Volume132
Issue number11
DOIs
StatePublished - Nov 2013

Fingerprint

Cluster Analysis
Genetic Markers
Genes

ASJC Scopus subject areas

  • Genetics(clinical)
  • Genetics

Cite this

Detecting genomic clustering of risk variants from sequence data : Cases versus controls. / Schaid, Daniel J; Sinnwell, Jason P.; McDonnell, Shannon K.; Thibodeau, Stephen N.

In: Human Genetics, Vol. 132, No. 11, 11.2013, p. 1301-1309.

Research output: Contribution to journalArticle

Schaid, Daniel J ; Sinnwell, Jason P. ; McDonnell, Shannon K. ; Thibodeau, Stephen N. / Detecting genomic clustering of risk variants from sequence data : Cases versus controls. In: Human Genetics. 2013 ; Vol. 132, No. 11. pp. 1301-1309.
@article{823bbb0c4321411b95246ce8f41a9a7f,
title = "Detecting genomic clustering of risk variants from sequence data: Cases versus controls",
abstract = "As the ability to measure dense genetic markers approaches the limit of the DNA sequence itself, taking advantage of possible clustering of genetic variants in, and around, a gene would benefit genetic association analyses, and likely provide biological insights. The greatest benefit might be realized when multiple rare variants cluster in a functional region. Several statistical tests have been developed, one of which is based on the popular Kulldorff scan statistic for spatial clustering of disease. We extended another popular spatial clustering method - Tango's statistic - to genomic sequence data. An advantage of Tango's method is that it is rapid to compute, and when single test statistic is computed, its distribution is well approximated by a scaled χ 2 distribution, making computation of p values very rapid. We compared the Type-I error rates and power of several clustering statistics, as well as the omnibus sequence kernel association test. Although our version of Tango's statistic, which we call {"}Kernel Distance{"} statistic, took approximately half the time to compute than the Kulldorff scan statistic, it had slightly less power than the scan statistic. Our results showed that the Ionita-Laza version of Kulldorff's scan statistic had the greatest power over a range of clustering scenarios.",
author = "Schaid, {Daniel J} and Sinnwell, {Jason P.} and McDonnell, {Shannon K.} and Thibodeau, {Stephen N}",
year = "2013",
month = "11",
doi = "10.1007/s00439-013-1335-y",
language = "English (US)",
volume = "132",
pages = "1301--1309",
journal = "Human Genetics",
issn = "0340-6717",
publisher = "Springer Verlag",
number = "11",

}

TY - JOUR

T1 - Detecting genomic clustering of risk variants from sequence data

T2 - Cases versus controls

AU - Schaid, Daniel J

AU - Sinnwell, Jason P.

AU - McDonnell, Shannon K.

AU - Thibodeau, Stephen N

PY - 2013/11

Y1 - 2013/11

N2 - As the ability to measure dense genetic markers approaches the limit of the DNA sequence itself, taking advantage of possible clustering of genetic variants in, and around, a gene would benefit genetic association analyses, and likely provide biological insights. The greatest benefit might be realized when multiple rare variants cluster in a functional region. Several statistical tests have been developed, one of which is based on the popular Kulldorff scan statistic for spatial clustering of disease. We extended another popular spatial clustering method - Tango's statistic - to genomic sequence data. An advantage of Tango's method is that it is rapid to compute, and when single test statistic is computed, its distribution is well approximated by a scaled χ 2 distribution, making computation of p values very rapid. We compared the Type-I error rates and power of several clustering statistics, as well as the omnibus sequence kernel association test. Although our version of Tango's statistic, which we call "Kernel Distance" statistic, took approximately half the time to compute than the Kulldorff scan statistic, it had slightly less power than the scan statistic. Our results showed that the Ionita-Laza version of Kulldorff's scan statistic had the greatest power over a range of clustering scenarios.

AB - As the ability to measure dense genetic markers approaches the limit of the DNA sequence itself, taking advantage of possible clustering of genetic variants in, and around, a gene would benefit genetic association analyses, and likely provide biological insights. The greatest benefit might be realized when multiple rare variants cluster in a functional region. Several statistical tests have been developed, one of which is based on the popular Kulldorff scan statistic for spatial clustering of disease. We extended another popular spatial clustering method - Tango's statistic - to genomic sequence data. An advantage of Tango's method is that it is rapid to compute, and when single test statistic is computed, its distribution is well approximated by a scaled χ 2 distribution, making computation of p values very rapid. We compared the Type-I error rates and power of several clustering statistics, as well as the omnibus sequence kernel association test. Although our version of Tango's statistic, which we call "Kernel Distance" statistic, took approximately half the time to compute than the Kulldorff scan statistic, it had slightly less power than the scan statistic. Our results showed that the Ionita-Laza version of Kulldorff's scan statistic had the greatest power over a range of clustering scenarios.

UR - http://www.scopus.com/inward/record.url?scp=84888315443&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=84888315443&partnerID=8YFLogxK

U2 - 10.1007/s00439-013-1335-y

DO - 10.1007/s00439-013-1335-y

M3 - Article

C2 - 23842950

AN - SCOPUS:84888315443

VL - 132

SP - 1301

EP - 1309

JO - Human Genetics

JF - Human Genetics

SN - 0340-6717

IS - 11

ER -