A weighted U-Statistic for genetic association analyses of sequencing data

Changshuai Wei; Ming Li; Zihuai He; Olga Vsevolozhskaya; Daniel J. Schaid; Qing Lu

doi:10.1002/gepi.21864

A weighted U-Statistic for genetic association analyses of sequencing data

Changshuai Wei, Ming Li, Zihuai He, Olga Vsevolozhskaya, Daniel J. Schaid, Qing Lu

Quantitative Health Sciences

Research output: Contribution to journal › Article › peer-review

9 Scopus citations

Abstract

With advancements in next-generation sequencing technology, a massive amount of sequencing data is generated, which offers a great opportunity to comprehensively investigate the role of rare variants in the genetic etiology of complex diseases. Nevertheless, the high-dimensional sequencing data poses a great challenge for statistical analysis. The association analyses based on traditional statistical methods suffer substantial power loss because of the low frequency of genetic variants and the extremely high dimensionality of the data. We developed a Weighted U Sequencing test, referred to as WU-SEQ, for the high-dimensional association analysis of sequencing data. Based on a nonparametric U-statistic, WU-SEQ makes no assumption of the underlying disease model and phenotype distribution, and can be applied to a variety of phenotypes. Through simulation studies and an empirical study, we showed that WU-SEQ outperformed a commonly used sequence kernel association test (SKAT) method when the underlying assumptions were violated (e.g., the phenotype followed a heavy-tailed distribution). Even when the assumptions were satisfied, WU-SEQ still attained comparable performance to SKAT. Finally, we applied WU-SEQ to sequencing data from the Dallas Heart Study (DHS), and detected an association between ANGPTL 4 and very low density lipoprotein cholesterol.

Original language	English (US)
Pages (from-to)	699-708
Number of pages	10
Journal	Genetic epidemiology
Volume	38
Issue number	8
DOIs	https://doi.org/10.1002/gepi.21864
State	Published - Dec 1 2014

Keywords

Next-generation sequencing
Rare variants
Weighted U-statistic

ASJC Scopus subject areas

Epidemiology
Genetics(clinical)

Access to Document

10.1002/gepi.21864

Cite this

@article{a232ac400a1e485b8b3d4ede681e7119,

title = "A weighted U-Statistic for genetic association analyses of sequencing data",

abstract = "With advancements in next-generation sequencing technology, a massive amount of sequencing data is generated, which offers a great opportunity to comprehensively investigate the role of rare variants in the genetic etiology of complex diseases. Nevertheless, the high-dimensional sequencing data poses a great challenge for statistical analysis. The association analyses based on traditional statistical methods suffer substantial power loss because of the low frequency of genetic variants and the extremely high dimensionality of the data. We developed a Weighted U Sequencing test, referred to as WU-SEQ, for the high-dimensional association analysis of sequencing data. Based on a nonparametric U-statistic, WU-SEQ makes no assumption of the underlying disease model and phenotype distribution, and can be applied to a variety of phenotypes. Through simulation studies and an empirical study, we showed that WU-SEQ outperformed a commonly used sequence kernel association test (SKAT) method when the underlying assumptions were violated (e.g., the phenotype followed a heavy-tailed distribution). Even when the assumptions were satisfied, WU-SEQ still attained comparable performance to SKAT. Finally, we applied WU-SEQ to sequencing data from the Dallas Heart Study (DHS), and detected an association between ANGPTL 4 and very low density lipoprotein cholesterol.",

keywords = "Next-generation sequencing, Rare variants, Weighted U-statistic",

author = "Changshuai Wei and Ming Li and Zihuai He and Olga Vsevolozhskaya and Schaid, {Daniel J.} and Qing Lu",

note = "Publisher Copyright: {\textcopyright} 2014 WILEY PERIODICALS, INC.",

year = "2014",

month = dec,

day = "1",

doi = "10.1002/gepi.21864",

language = "English (US)",

volume = "38",

pages = "699--708",

journal = "Genetic epidemiology",

issn = "0741-0395",

publisher = "Wiley-Liss Inc.",

number = "8",

}

TY - JOUR

T1 - A weighted U-Statistic for genetic association analyses of sequencing data

AU - Wei, Changshuai

AU - Li, Ming

AU - He, Zihuai

AU - Vsevolozhskaya, Olga

AU - Schaid, Daniel J.

AU - Lu, Qing

PY - 2014/12/1

Y1 - 2014/12/1

N2 - With advancements in next-generation sequencing technology, a massive amount of sequencing data is generated, which offers a great opportunity to comprehensively investigate the role of rare variants in the genetic etiology of complex diseases. Nevertheless, the high-dimensional sequencing data poses a great challenge for statistical analysis. The association analyses based on traditional statistical methods suffer substantial power loss because of the low frequency of genetic variants and the extremely high dimensionality of the data. We developed a Weighted U Sequencing test, referred to as WU-SEQ, for the high-dimensional association analysis of sequencing data. Based on a nonparametric U-statistic, WU-SEQ makes no assumption of the underlying disease model and phenotype distribution, and can be applied to a variety of phenotypes. Through simulation studies and an empirical study, we showed that WU-SEQ outperformed a commonly used sequence kernel association test (SKAT) method when the underlying assumptions were violated (e.g., the phenotype followed a heavy-tailed distribution). Even when the assumptions were satisfied, WU-SEQ still attained comparable performance to SKAT. Finally, we applied WU-SEQ to sequencing data from the Dallas Heart Study (DHS), and detected an association between ANGPTL 4 and very low density lipoprotein cholesterol.

AB - With advancements in next-generation sequencing technology, a massive amount of sequencing data is generated, which offers a great opportunity to comprehensively investigate the role of rare variants in the genetic etiology of complex diseases. Nevertheless, the high-dimensional sequencing data poses a great challenge for statistical analysis. The association analyses based on traditional statistical methods suffer substantial power loss because of the low frequency of genetic variants and the extremely high dimensionality of the data. We developed a Weighted U Sequencing test, referred to as WU-SEQ, for the high-dimensional association analysis of sequencing data. Based on a nonparametric U-statistic, WU-SEQ makes no assumption of the underlying disease model and phenotype distribution, and can be applied to a variety of phenotypes. Through simulation studies and an empirical study, we showed that WU-SEQ outperformed a commonly used sequence kernel association test (SKAT) method when the underlying assumptions were violated (e.g., the phenotype followed a heavy-tailed distribution). Even when the assumptions were satisfied, WU-SEQ still attained comparable performance to SKAT. Finally, we applied WU-SEQ to sequencing data from the Dallas Heart Study (DHS), and detected an association between ANGPTL 4 and very low density lipoprotein cholesterol.

KW - Next-generation sequencing

KW - Rare variants

KW - Weighted U-statistic

UR - http://www.scopus.com/inward/record.url?scp=84910604413&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=84910604413&partnerID=8YFLogxK

U2 - 10.1002/gepi.21864

DO - 10.1002/gepi.21864

M3 - Article

C2 - 25331574

AN - SCOPUS:84910604413

SN - 0741-0395

VL - 38

SP - 699

EP - 708

JO - Genetic epidemiology

JF - Genetic epidemiology

IS - 8

ER -

A weighted U-Statistic for genetic association analyses of sequencing data

Abstract

Keywords

ASJC Scopus subject areas

Access to Document

Other files and links

Fingerprint

Cite this