TY - JOUR
T1 - Winner's Curse Correction and Variable Thresholding Improve Performance of Polygenic Risk Modeling Based on Genome-Wide Association Study Summary-Level Data
AU - MGS (Molecular Genetics of Schizophrenia) GWAS Consortium
AU - GECCO (The Genetics and Epidemiology of Colorectal Cancer Consortium)
AU - The GAME-ON/TRICL (Transdisciplinary Research in Cancer of the Lung) GWAS Consortium
AU - PRACTICAL (PRostate cancer AssoCiation group To Investigate Cancer Associated aLterations) Consortium
AU - PanScan Consortium
AU - The GAME-ON/ELLIPSE Consortium
AU - Shi, Jianxin
AU - Park, Ju Hyun
AU - Duan, Jubao
AU - Berndt, Sonja T.
AU - Moy, Winton
AU - Yu, Kai
AU - Song, Lei
AU - Wheeler, William
AU - Hua, Xing
AU - Silverman, Debra
AU - Garcia-Closas, Montserrat
AU - Hsiung, Chao Agnes
AU - Figueroa, Jonine D.
AU - Cortessis, Victoria K.
AU - Malats, Núria
AU - Karagas, Margaret R.
AU - Vineis, Paolo
AU - Chang, I. Shou
AU - Lin, Dongxin
AU - Zhou, Baosen
AU - Seow, Adeline
AU - Matsuo, Keitaro
AU - Hong, Yun Chul
AU - Caporaso, Neil E.
AU - Wolpin, Brian
AU - Jacobs, Eric
AU - Petersen, Gloria M.
AU - Klein, Alison P.
AU - Li, Donghui
AU - Risch, Harvey
AU - Sanders, Alan R.
AU - Hsu, Li
AU - Schoen, Robert E.
AU - Brenner, Hermann
AU - Stolzenberg-Solomon, Rachael
AU - Gejman, Pablo
AU - Lan, Qing
AU - Rothman, Nathaniel
AU - Amundadottir, Laufey T.
AU - Landi, Maria Teresa
AU - Levinson, Douglas F.
AU - Chanock, Stephen J.
AU - Chatterjee, Nilanjan
N1 - Funding Information:
The research was supported by the NIH Intramural Research program. The TRICL Consortium was supported by NIH grant U19 CA148127. Funding for GECCO consortium: National Cancer Institute, National Institutes of Health, U.S. Department of Health and Human Services (U01 CA137088; R01 CA059045). ASTERISK: a Hospital Clinical Research Program (PHRC) and supported by the Regional Council of Pays de la Loire, the Groupement des Entreprises Fran?aises dans la Lutte contre le Cancer (GEFLUC), the Association Anne de Bretagne G?n?tique and the Ligue R?gionale Contre le Cancer (LRCC). COLO2&3: National Institutes of Health (R01 CA60987). DACHS: German Research Council (Deutsche Forschungsgemeinschaft, BR 1704/6-1, BR 1704/6-3, BR 1704/6-4 and CH 117/1-1), and the German Federal Ministry of Education and Research (01KH0404 and 01ER0814). DALS: National Institutes of Health (R01 CA48998 to M. L. Slattery). HPFS is supported by the National Institutes of Health (P01 CA 055075, UM1 CA167552, R01 137178, R01 CA151993 and P50 CA127003), NHS by the National Institutes of Health (UM1 CA186107, R01 CA137178, P01 CA87969, R01 CA151993 and P50 CA127003) and PHS by the National Institutes of Health (R01 CA042182). MEC: National Institutes of Health (R37 CA54281, P01 CA033619, and R01 CA63464). OFCCR: National Institutes of Health, through funding allocated to the Ontario Registry for Studies of Familial Colorectal Cancer (U01 CA074783); see CCFR section above. Additional funding toward genetic analyses of OFCCR includes the Ontario Research Fund, the Canadian Institutes of Health Research, and the Ontario Institute for Cancer Research, through generous support from the Ontario Ministry of Research and Innovation. PMH: National Institutes of Health (R01 CA076366 to P.A. Newcomb). VITAL: National Institutes of Health (K05 CA154337). WHI: The WHI program is funded by the National Heart, Lung, and Blood Institute, National Institutes of Health, U.S. Department of Health and Human Services through contracts HHSN268201100046C, HHSN268201100001C, HHSN268201100002C, HHSN268201100003C, HHSN268201100004C, and HHSN271201100004C. The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript.
PY - 2016/12
Y1 - 2016/12
N2 - Recent heritability analyses have indicated that genome-wide association studies (GWAS) have the potential to improve genetic risk prediction for complex diseases based on polygenic risk score (PRS), a simple modelling technique that can be implemented using summary-level data from the discovery samples. We herein propose modifications to improve the performance of PRS. We introduce threshold-dependent winner’s-curse adjustments for marginal association coefficients that are used to weight the single-nucleotide polymorphisms (SNPs) in PRS. Further, as a way to incorporate external functional/annotation knowledge that could identify subsets of SNPs highly enriched for associations, we propose variable thresholds for SNPs selection. We applied our methods to GWAS summary-level data of 14 complex diseases. Across all diseases, a simple winner’s curse correction uniformly led to enhancement of performance of the models, whereas incorporation of functional SNPs was beneficial only for selected diseases. Compared to the standard PRS algorithm, the proposed methods in combination led to notable gain in efficiency (25–50% increase in the prediction R2) for 5 of 14 diseases. As an example, for GWAS of type 2 diabetes, winner’s curse correction improved prediction R2from 2.29% based on the standard PRS to 3.10% (P = 0.0017) and incorporating functional annotation data further improved R2to 3.53% (P = 2×10−5). Our simulation studies illustrate why differential treatment of certain categories of functional SNPs, even when shown to be highly enriched for GWAS-heritability, does not lead to proportionate improvement in genetic risk-prediction because of non-uniform linkage disequilibrium structure.
AB - Recent heritability analyses have indicated that genome-wide association studies (GWAS) have the potential to improve genetic risk prediction for complex diseases based on polygenic risk score (PRS), a simple modelling technique that can be implemented using summary-level data from the discovery samples. We herein propose modifications to improve the performance of PRS. We introduce threshold-dependent winner’s-curse adjustments for marginal association coefficients that are used to weight the single-nucleotide polymorphisms (SNPs) in PRS. Further, as a way to incorporate external functional/annotation knowledge that could identify subsets of SNPs highly enriched for associations, we propose variable thresholds for SNPs selection. We applied our methods to GWAS summary-level data of 14 complex diseases. Across all diseases, a simple winner’s curse correction uniformly led to enhancement of performance of the models, whereas incorporation of functional SNPs was beneficial only for selected diseases. Compared to the standard PRS algorithm, the proposed methods in combination led to notable gain in efficiency (25–50% increase in the prediction R2) for 5 of 14 diseases. As an example, for GWAS of type 2 diabetes, winner’s curse correction improved prediction R2from 2.29% based on the standard PRS to 3.10% (P = 0.0017) and incorporating functional annotation data further improved R2to 3.53% (P = 2×10−5). Our simulation studies illustrate why differential treatment of certain categories of functional SNPs, even when shown to be highly enriched for GWAS-heritability, does not lead to proportionate improvement in genetic risk-prediction because of non-uniform linkage disequilibrium structure.
UR - http://www.scopus.com/inward/record.url?scp=85007574079&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=85007574079&partnerID=8YFLogxK
U2 - 10.1371/journal.pgen.1006493
DO - 10.1371/journal.pgen.1006493
M3 - Article
C2 - 28036406
AN - SCOPUS:85007574079
VL - 12
JO - PLoS Genetics
JF - PLoS Genetics
SN - 1553-7390
IS - 12
M1 - e1006493
ER -