gsSKAT: Rapid gene set analysis and multiple testing correction for rare-variant association studies using weighted linear kernels

Nicholas Larson, Shannon Mcdonnell, Lisa Cannon Albright, Craig Teerlink, Janet Stanford, Elaine A. Ostrander, William B. Isaacs, Jianfeng Xu, Kathleen A. Cooney, Ethan Lange, Johanna Schleutker, John D. Carpten, Isaac Powell, Joan E. Bailey-Wilson, Olivier Cussenot, Geraldine Cancel-Tassin, Graham G. Giles, Robert J. Macinnis, Christiane Maier, Alice S. WhittemoreChih Lin Hsieh, Fredrik Wiklund, William J. Catolona, William Foulkes, Diptasri Mandal, Rosalind Eeles, Zsofia Kote-Jarai, Michael John Ackerman, Timothy Mark Olson, Christopher Jon Klein, Stephen N Thibodeau, Daniel J Schaid

Research output: Contribution to journalArticle

2 Citations (Scopus)

Abstract

Next-generation sequencing technologies have afforded unprecedented characterization of low-frequency and rare genetic variation. Due to low power for single-variant testing, aggregative methods are commonly used to combine observed rare variation within a single gene. Causal variation may also aggregate across multiple genes within relevant biomolecular pathways. Kernel-machine regression and adaptive testing methods for aggregative rare-variant association testing have been demonstrated to be powerful approaches for pathway-level analysis, although these methods tend to be computationally intensive at high-variant dimensionality and require access to complete data. An additional analytical issue in scans of large pathway definition sets is multiple testing correction. Gene set definitions may exhibit substantial genic overlap, and the impact of the resultant correlation in test statistics on Type I error rate control for large agnostic gene set scans has not been fully explored. Herein, we first outline a statistical strategy for aggregative rare-variant analysis using component gene-level linear kernel score test summary statistics as well as derive simple estimators of the effective number of tests for family-wise error rate control. We then conduct extensive simulation studies to characterize the behavior of our approach relative to direct application of kernel and adaptive methods under a variety of conditions. We also apply our method to two case-control studies, respectively, evaluating rare variation in hereditary prostate cancer and schizophrenia. Finally, we provide open-source R code for public use to facilitate easy application of our methods to existing rare-variant analysis results.

Original languageEnglish (US)
JournalGenetic Epidemiology
DOIs
StateAccepted/In press - 2017

Fingerprint

Genes
Gene Components
Case-Control Studies
Schizophrenia
Technology
Familial Prostate cancer
Power (Psychology)

Keywords

  • Gene set
  • Next-generation sequencing
  • Pathway
  • Rare variation

ASJC Scopus subject areas

  • Epidemiology
  • Genetics(clinical)

Cite this

gsSKAT : Rapid gene set analysis and multiple testing correction for rare-variant association studies using weighted linear kernels. / Larson, Nicholas; Mcdonnell, Shannon; Cannon Albright, Lisa; Teerlink, Craig; Stanford, Janet; Ostrander, Elaine A.; Isaacs, William B.; Xu, Jianfeng; Cooney, Kathleen A.; Lange, Ethan; Schleutker, Johanna; Carpten, John D.; Powell, Isaac; Bailey-Wilson, Joan E.; Cussenot, Olivier; Cancel-Tassin, Geraldine; Giles, Graham G.; Macinnis, Robert J.; Maier, Christiane; Whittemore, Alice S.; Hsieh, Chih Lin; Wiklund, Fredrik; Catolona, William J.; Foulkes, William; Mandal, Diptasri; Eeles, Rosalind; Kote-Jarai, Zsofia; Ackerman, Michael John; Olson, Timothy Mark; Klein, Christopher Jon; Thibodeau, Stephen N; Schaid, Daniel J.

In: Genetic Epidemiology, 2017.

Research output: Contribution to journalArticle

Larson, N, Mcdonnell, S, Cannon Albright, L, Teerlink, C, Stanford, J, Ostrander, EA, Isaacs, WB, Xu, J, Cooney, KA, Lange, E, Schleutker, J, Carpten, JD, Powell, I, Bailey-Wilson, JE, Cussenot, O, Cancel-Tassin, G, Giles, GG, Macinnis, RJ, Maier, C, Whittemore, AS, Hsieh, CL, Wiklund, F, Catolona, WJ, Foulkes, W, Mandal, D, Eeles, R, Kote-Jarai, Z, Ackerman, MJ, Olson, TM, Klein, CJ, Thibodeau, SN & Schaid, DJ 2017, 'gsSKAT: Rapid gene set analysis and multiple testing correction for rare-variant association studies using weighted linear kernels', Genetic Epidemiology. https://doi.org/10.1002/gepi.22036
Larson, Nicholas ; Mcdonnell, Shannon ; Cannon Albright, Lisa ; Teerlink, Craig ; Stanford, Janet ; Ostrander, Elaine A. ; Isaacs, William B. ; Xu, Jianfeng ; Cooney, Kathleen A. ; Lange, Ethan ; Schleutker, Johanna ; Carpten, John D. ; Powell, Isaac ; Bailey-Wilson, Joan E. ; Cussenot, Olivier ; Cancel-Tassin, Geraldine ; Giles, Graham G. ; Macinnis, Robert J. ; Maier, Christiane ; Whittemore, Alice S. ; Hsieh, Chih Lin ; Wiklund, Fredrik ; Catolona, William J. ; Foulkes, William ; Mandal, Diptasri ; Eeles, Rosalind ; Kote-Jarai, Zsofia ; Ackerman, Michael John ; Olson, Timothy Mark ; Klein, Christopher Jon ; Thibodeau, Stephen N ; Schaid, Daniel J. / gsSKAT : Rapid gene set analysis and multiple testing correction for rare-variant association studies using weighted linear kernels. In: Genetic Epidemiology. 2017.
@article{ae0d372dfa4e40e8ad0d03b84309724d,
title = "gsSKAT: Rapid gene set analysis and multiple testing correction for rare-variant association studies using weighted linear kernels",
abstract = "Next-generation sequencing technologies have afforded unprecedented characterization of low-frequency and rare genetic variation. Due to low power for single-variant testing, aggregative methods are commonly used to combine observed rare variation within a single gene. Causal variation may also aggregate across multiple genes within relevant biomolecular pathways. Kernel-machine regression and adaptive testing methods for aggregative rare-variant association testing have been demonstrated to be powerful approaches for pathway-level analysis, although these methods tend to be computationally intensive at high-variant dimensionality and require access to complete data. An additional analytical issue in scans of large pathway definition sets is multiple testing correction. Gene set definitions may exhibit substantial genic overlap, and the impact of the resultant correlation in test statistics on Type I error rate control for large agnostic gene set scans has not been fully explored. Herein, we first outline a statistical strategy for aggregative rare-variant analysis using component gene-level linear kernel score test summary statistics as well as derive simple estimators of the effective number of tests for family-wise error rate control. We then conduct extensive simulation studies to characterize the behavior of our approach relative to direct application of kernel and adaptive methods under a variety of conditions. We also apply our method to two case-control studies, respectively, evaluating rare variation in hereditary prostate cancer and schizophrenia. Finally, we provide open-source R code for public use to facilitate easy application of our methods to existing rare-variant analysis results.",
keywords = "Gene set, Next-generation sequencing, Pathway, Rare variation",
author = "Nicholas Larson and Shannon Mcdonnell and {Cannon Albright}, Lisa and Craig Teerlink and Janet Stanford and Ostrander, {Elaine A.} and Isaacs, {William B.} and Jianfeng Xu and Cooney, {Kathleen A.} and Ethan Lange and Johanna Schleutker and Carpten, {John D.} and Isaac Powell and Bailey-Wilson, {Joan E.} and Olivier Cussenot and Geraldine Cancel-Tassin and Giles, {Graham G.} and Macinnis, {Robert J.} and Christiane Maier and Whittemore, {Alice S.} and Hsieh, {Chih Lin} and Fredrik Wiklund and Catolona, {William J.} and William Foulkes and Diptasri Mandal and Rosalind Eeles and Zsofia Kote-Jarai and Ackerman, {Michael John} and Olson, {Timothy Mark} and Klein, {Christopher Jon} and Thibodeau, {Stephen N} and Schaid, {Daniel J}",
year = "2017",
doi = "10.1002/gepi.22036",
language = "English (US)",
journal = "Genetic Epidemiology",
issn = "0741-0395",
publisher = "Wiley-Liss Inc.",

}

TY - JOUR

T1 - gsSKAT

T2 - Rapid gene set analysis and multiple testing correction for rare-variant association studies using weighted linear kernels

AU - Larson, Nicholas

AU - Mcdonnell, Shannon

AU - Cannon Albright, Lisa

AU - Teerlink, Craig

AU - Stanford, Janet

AU - Ostrander, Elaine A.

AU - Isaacs, William B.

AU - Xu, Jianfeng

AU - Cooney, Kathleen A.

AU - Lange, Ethan

AU - Schleutker, Johanna

AU - Carpten, John D.

AU - Powell, Isaac

AU - Bailey-Wilson, Joan E.

AU - Cussenot, Olivier

AU - Cancel-Tassin, Geraldine

AU - Giles, Graham G.

AU - Macinnis, Robert J.

AU - Maier, Christiane

AU - Whittemore, Alice S.

AU - Hsieh, Chih Lin

AU - Wiklund, Fredrik

AU - Catolona, William J.

AU - Foulkes, William

AU - Mandal, Diptasri

AU - Eeles, Rosalind

AU - Kote-Jarai, Zsofia

AU - Ackerman, Michael John

AU - Olson, Timothy Mark

AU - Klein, Christopher Jon

AU - Thibodeau, Stephen N

AU - Schaid, Daniel J

PY - 2017

Y1 - 2017

N2 - Next-generation sequencing technologies have afforded unprecedented characterization of low-frequency and rare genetic variation. Due to low power for single-variant testing, aggregative methods are commonly used to combine observed rare variation within a single gene. Causal variation may also aggregate across multiple genes within relevant biomolecular pathways. Kernel-machine regression and adaptive testing methods for aggregative rare-variant association testing have been demonstrated to be powerful approaches for pathway-level analysis, although these methods tend to be computationally intensive at high-variant dimensionality and require access to complete data. An additional analytical issue in scans of large pathway definition sets is multiple testing correction. Gene set definitions may exhibit substantial genic overlap, and the impact of the resultant correlation in test statistics on Type I error rate control for large agnostic gene set scans has not been fully explored. Herein, we first outline a statistical strategy for aggregative rare-variant analysis using component gene-level linear kernel score test summary statistics as well as derive simple estimators of the effective number of tests for family-wise error rate control. We then conduct extensive simulation studies to characterize the behavior of our approach relative to direct application of kernel and adaptive methods under a variety of conditions. We also apply our method to two case-control studies, respectively, evaluating rare variation in hereditary prostate cancer and schizophrenia. Finally, we provide open-source R code for public use to facilitate easy application of our methods to existing rare-variant analysis results.

AB - Next-generation sequencing technologies have afforded unprecedented characterization of low-frequency and rare genetic variation. Due to low power for single-variant testing, aggregative methods are commonly used to combine observed rare variation within a single gene. Causal variation may also aggregate across multiple genes within relevant biomolecular pathways. Kernel-machine regression and adaptive testing methods for aggregative rare-variant association testing have been demonstrated to be powerful approaches for pathway-level analysis, although these methods tend to be computationally intensive at high-variant dimensionality and require access to complete data. An additional analytical issue in scans of large pathway definition sets is multiple testing correction. Gene set definitions may exhibit substantial genic overlap, and the impact of the resultant correlation in test statistics on Type I error rate control for large agnostic gene set scans has not been fully explored. Herein, we first outline a statistical strategy for aggregative rare-variant analysis using component gene-level linear kernel score test summary statistics as well as derive simple estimators of the effective number of tests for family-wise error rate control. We then conduct extensive simulation studies to characterize the behavior of our approach relative to direct application of kernel and adaptive methods under a variety of conditions. We also apply our method to two case-control studies, respectively, evaluating rare variation in hereditary prostate cancer and schizophrenia. Finally, we provide open-source R code for public use to facilitate easy application of our methods to existing rare-variant analysis results.

KW - Gene set

KW - Next-generation sequencing

KW - Pathway

KW - Rare variation

UR - http://www.scopus.com/inward/record.url?scp=85013287752&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=85013287752&partnerID=8YFLogxK

U2 - 10.1002/gepi.22036

DO - 10.1002/gepi.22036

M3 - Article

C2 - 28211093

AN - SCOPUS:85013287752

JO - Genetic Epidemiology

JF - Genetic Epidemiology

SN - 0741-0395

ER -