TY - JOUR
T1 - A weighted U-Statistic for genetic association analyses of sequencing data
AU - Wei, Changshuai
AU - Li, Ming
AU - He, Zihuai
AU - Vsevolozhskaya, Olga
AU - Schaid, Daniel J.
AU - Lu, Qing
N1 - Publisher Copyright:
© 2014 WILEY PERIODICALS, INC.
PY - 2014/12/1
Y1 - 2014/12/1
N2 - With advancements in next-generation sequencing technology, a massive amount of sequencing data is generated, which offers a great opportunity to comprehensively investigate the role of rare variants in the genetic etiology of complex diseases. Nevertheless, the high-dimensional sequencing data poses a great challenge for statistical analysis. The association analyses based on traditional statistical methods suffer substantial power loss because of the low frequency of genetic variants and the extremely high dimensionality of the data. We developed a Weighted U Sequencing test, referred to as WU-SEQ, for the high-dimensional association analysis of sequencing data. Based on a nonparametric U-statistic, WU-SEQ makes no assumption of the underlying disease model and phenotype distribution, and can be applied to a variety of phenotypes. Through simulation studies and an empirical study, we showed that WU-SEQ outperformed a commonly used sequence kernel association test (SKAT) method when the underlying assumptions were violated (e.g., the phenotype followed a heavy-tailed distribution). Even when the assumptions were satisfied, WU-SEQ still attained comparable performance to SKAT. Finally, we applied WU-SEQ to sequencing data from the Dallas Heart Study (DHS), and detected an association between ANGPTL 4 and very low density lipoprotein cholesterol.
AB - With advancements in next-generation sequencing technology, a massive amount of sequencing data is generated, which offers a great opportunity to comprehensively investigate the role of rare variants in the genetic etiology of complex diseases. Nevertheless, the high-dimensional sequencing data poses a great challenge for statistical analysis. The association analyses based on traditional statistical methods suffer substantial power loss because of the low frequency of genetic variants and the extremely high dimensionality of the data. We developed a Weighted U Sequencing test, referred to as WU-SEQ, for the high-dimensional association analysis of sequencing data. Based on a nonparametric U-statistic, WU-SEQ makes no assumption of the underlying disease model and phenotype distribution, and can be applied to a variety of phenotypes. Through simulation studies and an empirical study, we showed that WU-SEQ outperformed a commonly used sequence kernel association test (SKAT) method when the underlying assumptions were violated (e.g., the phenotype followed a heavy-tailed distribution). Even when the assumptions were satisfied, WU-SEQ still attained comparable performance to SKAT. Finally, we applied WU-SEQ to sequencing data from the Dallas Heart Study (DHS), and detected an association between ANGPTL 4 and very low density lipoprotein cholesterol.
KW - Next-generation sequencing
KW - Rare variants
KW - Weighted U-statistic
UR - http://www.scopus.com/inward/record.url?scp=84910604413&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=84910604413&partnerID=8YFLogxK
U2 - 10.1002/gepi.21864
DO - 10.1002/gepi.21864
M3 - Article
C2 - 25331574
AN - SCOPUS:84910604413
SN - 0741-0395
VL - 38
SP - 699
EP - 708
JO - Genetic epidemiology
JF - Genetic epidemiology
IS - 8
ER -