TY - JOUR
T1 - A rare variant analysis framework using public genotype summary counts to prioritize disease-predisposition genes
AU - Chen, Wenan
AU - Wang, Shuoguo
AU - Tithi, Saima Sultana
AU - Ellison, David W.
AU - Schaid, Daniel J.
AU - Wu, Gang
N1 - Funding Information:
This study was supported in part by the National Cancer Institute grant P30 CA021765 [G.W.] and 5U54NS092091-08 [G.W.], American Lebanese Syrian Associated Charities (ALSAC), and the U.S. Public Health Service and National Institutes of Health (contract grant number R35 GM140487) [D.J.S]. The content is solely the responsibility of the authors and does not necessarily represent the official views of the National Institutes of Health. We acknowledge the permission granted to use data from the Alzheimer’s Disease Sequencing Project (dbGaP Study Accession phs000572.v8.p4) and The Cancer Genome Atlas (dbGaP Study Accession phs000178.v11.p8). We acknowledge Michael Edmonson’s help with the conversion between plain text-based matrix data format and the VCF format, Jason P. Sinnwell’s help on LD packages, and Angela J. McArthur’s helpful scientific editing.
Funding Information:
This study was supported in part by the National Cancer Institute grant P30 CA021765 [G.W.] and 5U54NS092091-08 [G.W.], American Lebanese Syrian Associated Charities (ALSAC), and the U.S. Public Health Service and National Institutes of Health (contract grant number R35 GM140487) [D.J.S]. The content is solely the responsibility of the authors and does not necessarily represent the official views of the National Institutes of Health. We acknowledge the permission granted to use data from the Alzheimer’s Disease Sequencing Project (dbGaP Study Accession phs000572.v8.p4) and The Cancer Genome Atlas (dbGaP Study Accession phs000178.v11.p8). We acknowledge Michael Edmonson’s help with the conversion between plain text-based matrix data format and the VCF format, Jason P. Sinnwell’s help on LD packages, and Angela J. McArthur’s helpful scientific editing.
Publisher Copyright:
© 2022, The Author(s).
PY - 2022/12
Y1 - 2022/12
N2 - Sequencing cases without matched healthy controls hinders prioritization of germline disease-predisposition genes. To circumvent this problem, genotype summary counts from public data sets can serve as controls. However, systematic inflation and false positives can arise if confounding factors are not controlled. We propose a framework, consistent summary counts based rare variant burden test (CoCoRV), to address these challenges. CoCoRV implements consistent variant quality control and filtering, ethnicity-stratified rare variant association test, accurate estimation of inflation factors, powerful FDR control, and detection of rare variant pairs in high linkage disequilibrium. When we applied CoCoRV to pediatric cancer cohorts, the top genes identified were cancer-predisposition genes. We also applied CoCoRV to identify disease-predisposition genes in adult brain tumors and amyotrophic lateral sclerosis. Given that potential confounding factors were well controlled after applying the framework, CoCoRV provides a cost-effective solution to prioritizing disease-risk genes enriched with rare pathogenic variants.
AB - Sequencing cases without matched healthy controls hinders prioritization of germline disease-predisposition genes. To circumvent this problem, genotype summary counts from public data sets can serve as controls. However, systematic inflation and false positives can arise if confounding factors are not controlled. We propose a framework, consistent summary counts based rare variant burden test (CoCoRV), to address these challenges. CoCoRV implements consistent variant quality control and filtering, ethnicity-stratified rare variant association test, accurate estimation of inflation factors, powerful FDR control, and detection of rare variant pairs in high linkage disequilibrium. When we applied CoCoRV to pediatric cancer cohorts, the top genes identified were cancer-predisposition genes. We also applied CoCoRV to identify disease-predisposition genes in adult brain tumors and amyotrophic lateral sclerosis. Given that potential confounding factors were well controlled after applying the framework, CoCoRV provides a cost-effective solution to prioritizing disease-risk genes enriched with rare pathogenic variants.
UR - http://www.scopus.com/inward/record.url?scp=85129968021&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=85129968021&partnerID=8YFLogxK
U2 - 10.1038/s41467-022-30248-0
DO - 10.1038/s41467-022-30248-0
M3 - Article
C2 - 35545612
AN - SCOPUS:85129968021
SN - 2041-1723
VL - 13
JO - Nature Communications
JF - Nature Communications
IS - 1
M1 - 2592
ER -