CpGFilter

Model-based CpG probe filtering with replicates for epigenome-wide association studies

Jun Chen, Allan C. Just, Joel Schwartz, Lifang Hou, Nadereh Jafari, Zhifu D Sun, Jean-Pierre Kocher, Andrea Baccarelli, Xihong Lin

Research output: Contribution to journalArticle

9 Citations (Scopus)

Abstract

Summary: The development of the Infinium HumanMethylation450 BeadChip enables epigenome-wide association studies at a reduced cost. One observation of the 450K data is that many CpG sites the beadchip interrogates have very large measurement errors. Including these noisy CpGs will decrease the statistical power of detecting relevant associations due to multiple testing correction. We propose to use intra-class correlation coefficient (ICC), which characterizes the relative contribution of the biological variability to the total variability, to filter CpGs when technical replicates are available. We estimate the ICC based on a linear mixed effects model by pooling all the samples instead of using the technical replicates only. An ultra-fast algorithm has been developed to address the computational complexity and CpG filtering can be completed in minutes on a desktop computer for a 450K data set of over 1000 samples. Our method is very flexible and can accommodate any replicate design. Simulations and a real data application demonstrate that our whole-sample ICC method performs better than replicate-sample ICC or variance-based method.

Original languageEnglish (US)
Pages (from-to)469-471
Number of pages3
JournalBioinformatics
Volume32
Issue number3
DOIs
StatePublished - Apr 23 2015

Fingerprint

Intraclass Correlation Coefficient
Measurement errors
Personal computers
Computational complexity
Probe
Filtering
Model-based
Testing
Costs
Linear Mixed Effects Model
Statistical Power
Multiple Testing
Pooling
Observation
Measurement Error
Fast Algorithm
Costs and Cost Analysis
Computational Complexity
Filter
Decrease

ASJC Scopus subject areas

  • Biochemistry
  • Molecular Biology
  • Computational Theory and Mathematics
  • Computer Science Applications
  • Computational Mathematics
  • Statistics and Probability

Cite this

CpGFilter : Model-based CpG probe filtering with replicates for epigenome-wide association studies. / Chen, Jun; Just, Allan C.; Schwartz, Joel; Hou, Lifang; Jafari, Nadereh; Sun, Zhifu D; Kocher, Jean-Pierre; Baccarelli, Andrea; Lin, Xihong.

In: Bioinformatics, Vol. 32, No. 3, 23.04.2015, p. 469-471.

Research output: Contribution to journalArticle

Chen, Jun ; Just, Allan C. ; Schwartz, Joel ; Hou, Lifang ; Jafari, Nadereh ; Sun, Zhifu D ; Kocher, Jean-Pierre ; Baccarelli, Andrea ; Lin, Xihong. / CpGFilter : Model-based CpG probe filtering with replicates for epigenome-wide association studies. In: Bioinformatics. 2015 ; Vol. 32, No. 3. pp. 469-471.
@article{21086f85d70843efbe9ae95c81ba1907,
title = "CpGFilter: Model-based CpG probe filtering with replicates for epigenome-wide association studies",
abstract = "Summary: The development of the Infinium HumanMethylation450 BeadChip enables epigenome-wide association studies at a reduced cost. One observation of the 450K data is that many CpG sites the beadchip interrogates have very large measurement errors. Including these noisy CpGs will decrease the statistical power of detecting relevant associations due to multiple testing correction. We propose to use intra-class correlation coefficient (ICC), which characterizes the relative contribution of the biological variability to the total variability, to filter CpGs when technical replicates are available. We estimate the ICC based on a linear mixed effects model by pooling all the samples instead of using the technical replicates only. An ultra-fast algorithm has been developed to address the computational complexity and CpG filtering can be completed in minutes on a desktop computer for a 450K data set of over 1000 samples. Our method is very flexible and can accommodate any replicate design. Simulations and a real data application demonstrate that our whole-sample ICC method performs better than replicate-sample ICC or variance-based method.",
author = "Jun Chen and Just, {Allan C.} and Joel Schwartz and Lifang Hou and Nadereh Jafari and Sun, {Zhifu D} and Jean-Pierre Kocher and Andrea Baccarelli and Xihong Lin",
year = "2015",
month = "4",
day = "23",
doi = "10.1093/bioinformatics/btv577",
language = "English (US)",
volume = "32",
pages = "469--471",
journal = "Bioinformatics",
issn = "1367-4803",
publisher = "Oxford University Press",
number = "3",

}

TY - JOUR

T1 - CpGFilter

T2 - Model-based CpG probe filtering with replicates for epigenome-wide association studies

AU - Chen, Jun

AU - Just, Allan C.

AU - Schwartz, Joel

AU - Hou, Lifang

AU - Jafari, Nadereh

AU - Sun, Zhifu D

AU - Kocher, Jean-Pierre

AU - Baccarelli, Andrea

AU - Lin, Xihong

PY - 2015/4/23

Y1 - 2015/4/23

N2 - Summary: The development of the Infinium HumanMethylation450 BeadChip enables epigenome-wide association studies at a reduced cost. One observation of the 450K data is that many CpG sites the beadchip interrogates have very large measurement errors. Including these noisy CpGs will decrease the statistical power of detecting relevant associations due to multiple testing correction. We propose to use intra-class correlation coefficient (ICC), which characterizes the relative contribution of the biological variability to the total variability, to filter CpGs when technical replicates are available. We estimate the ICC based on a linear mixed effects model by pooling all the samples instead of using the technical replicates only. An ultra-fast algorithm has been developed to address the computational complexity and CpG filtering can be completed in minutes on a desktop computer for a 450K data set of over 1000 samples. Our method is very flexible and can accommodate any replicate design. Simulations and a real data application demonstrate that our whole-sample ICC method performs better than replicate-sample ICC or variance-based method.

AB - Summary: The development of the Infinium HumanMethylation450 BeadChip enables epigenome-wide association studies at a reduced cost. One observation of the 450K data is that many CpG sites the beadchip interrogates have very large measurement errors. Including these noisy CpGs will decrease the statistical power of detecting relevant associations due to multiple testing correction. We propose to use intra-class correlation coefficient (ICC), which characterizes the relative contribution of the biological variability to the total variability, to filter CpGs when technical replicates are available. We estimate the ICC based on a linear mixed effects model by pooling all the samples instead of using the technical replicates only. An ultra-fast algorithm has been developed to address the computational complexity and CpG filtering can be completed in minutes on a desktop computer for a 450K data set of over 1000 samples. Our method is very flexible and can accommodate any replicate design. Simulations and a real data application demonstrate that our whole-sample ICC method performs better than replicate-sample ICC or variance-based method.

UR - http://www.scopus.com/inward/record.url?scp=84962265047&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=84962265047&partnerID=8YFLogxK

U2 - 10.1093/bioinformatics/btv577

DO - 10.1093/bioinformatics/btv577

M3 - Article

VL - 32

SP - 469

EP - 471

JO - Bioinformatics

JF - Bioinformatics

SN - 1367-4803

IS - 3

ER -