TY - JOUR
T1 - Leveraging biological and statistical covariates improves the detection power in epigenome-wide association testing
AU - Huang, Jinyan
AU - Bai, Ling
AU - Cui, Bowen
AU - Wu, Liang
AU - Wang, Liwen
AU - An, Zhiyin
AU - Ruan, Shulin
AU - Yu, Yue
AU - Zhang, Xianyang
AU - Chen, Jun
N1 - Publisher Copyright:
© 2020 The Author(s).
PY - 2020/4/6
Y1 - 2020/4/6
N2 - Background: Epigenome-wide association studies (EWAS), which seek the association between epigenetic marks and an outcome or exposure, involve multiple hypothesis testing. False discovery rate (FDR) control has been widely used for multiple testing correction. However, traditional FDR control methods do not use auxiliary covariates, and they could be less powerful if the covariates could inform the likelihood of the null hypothesis. Recently, many covariate-adaptive FDR control methods have been developed, but application of these methods to EWAS data has not yet been explored. It is not clear whether these methods can significantly improve detection power, and if so, which covariates are more relevant for EWAS data. Results: In this study, we evaluate the performance of five covariate-adaptive FDR control methods with EWAS-related covariates using simulated as well as real EWAS datasets. We develop an omnibus test to assess the informativeness of the covariates. We find that statistical covariates are generally more informative than biological covariates, and the covariates of methylation mean and variance are almost universally informative. In contrast, the informativeness of biological covariates depends on specific datasets. We show that the independent hypothesis weighting (IHW) and covariate adaptive multiple testing (CAMT) method are overall more powerful, especially for sparse signals, and could improve the detection power by a median of 25% and 68% on real datasets, compared to the ST procedure. We further validate the findings in various biological contexts. Conclusions: Covariate-adaptive FDR control methods with informative covariates can significantly increase the detection power for EWAS. For sparse signals, IHW and CAMT are recommended.
AB - Background: Epigenome-wide association studies (EWAS), which seek the association between epigenetic marks and an outcome or exposure, involve multiple hypothesis testing. False discovery rate (FDR) control has been widely used for multiple testing correction. However, traditional FDR control methods do not use auxiliary covariates, and they could be less powerful if the covariates could inform the likelihood of the null hypothesis. Recently, many covariate-adaptive FDR control methods have been developed, but application of these methods to EWAS data has not yet been explored. It is not clear whether these methods can significantly improve detection power, and if so, which covariates are more relevant for EWAS data. Results: In this study, we evaluate the performance of five covariate-adaptive FDR control methods with EWAS-related covariates using simulated as well as real EWAS datasets. We develop an omnibus test to assess the informativeness of the covariates. We find that statistical covariates are generally more informative than biological covariates, and the covariates of methylation mean and variance are almost universally informative. In contrast, the informativeness of biological covariates depends on specific datasets. We show that the independent hypothesis weighting (IHW) and covariate adaptive multiple testing (CAMT) method are overall more powerful, especially for sparse signals, and could improve the detection power by a median of 25% and 68% on real datasets, compared to the ST procedure. We further validate the findings in various biological contexts. Conclusions: Covariate-adaptive FDR control methods with informative covariates can significantly increase the detection power for EWAS. For sparse signals, IHW and CAMT are recommended.
KW - Covariate
KW - EWAS
KW - False discovery rate
KW - Multiple hypothesis testing
UR - http://www.scopus.com/inward/record.url?scp=85083071439&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=85083071439&partnerID=8YFLogxK
U2 - 10.1186/s13059-020-02001-7
DO - 10.1186/s13059-020-02001-7
M3 - Article
C2 - 32252795
AN - SCOPUS:85083071439
SN - 1474-7596
VL - 21
JO - Genome biology
JF - Genome biology
IS - 1
M1 - 88
ER -