Statistical method evaluation for differentially methylated CpGs in base resolution next-generation DNA sequencing data

Yun Zhang, Saurabh Baheti, Zhifu Sun

Research output: Contribution to journalArticlepeer-review

13 Scopus citations

Abstract

High-throughput bisulfite methylation sequencing such as reduced representation bisulfite sequencing (RRBS), Agilent SureSelect Human Methyl-Seq (Methyl-seq) or whole-genome bisulfite sequencing is commonly used for base resolution methylome research. These data are represented either by the ratio of methylated cytosine versus total coverage at a CpG site or numbers of methylated and unmethylated cytosines. Multiple statistical methods can be used to detect differentially methylated CpGs (DMCs) between conditions, and these methods are often the base for the next step of differentially methylated region identification. The ratio data have a flexibility of fitting to many linear models, but the raw count data take consideration of coverage information. There is an array of options in each datatype for DMC detection; however, it is not clear which is an optimal statistical method. In this study, we systematically evaluated four statistic methods on methylation ratio data and four methods on count-based data and compared their performances with regard to type I error control, sensitivity and specificity of DMC detection and computational resource demands using real RRBS data along with simulation. Our results show that the ratio-based tests are generally more conservative (less sensitive) than the countbased tests. However, some count-based methods have high false-positive rates and should be avoided. The beta-binomial model gives a good balance between sensitivity and specificity and is preferred method. Selection of methods in different settings, signal versus noise and sample size estimation are also discussed.

Original languageEnglish (US)
Pages (from-to)374-386
Number of pages13
JournalBriefings in bioinformatics
Volume19
Issue number3
DOIs
StatePublished - May 1 2018

Keywords

  • Bisulfite next-generation sequencing
  • Differential methylation
  • Statistical method comparison

ASJC Scopus subject areas

  • Information Systems
  • Molecular Biology

Fingerprint

Dive into the research topics of 'Statistical method evaluation for differentially methylated CpGs in base resolution next-generation DNA sequencing data'. Together they form a unique fingerprint.

Cite this