Targeted alignment and end repair elimination increase alignment and methylation measure accuracy for reduced representation bisulfite sequencing data

Saurabh Baheti, Rahul Kanwar, Meike Goelzenleuchter, Jean-Pierre Kocher, Andreas S Beutler, Zhifu D Sun

Research output: Contribution to journalArticle

Abstract

Background: DNA methylation is an important epigenetic modification involved in many biological processes. Reduced representation bisulfite sequencing (RRBS) is a cost-effective method for studying DNA methylation at single base resolution. Although several tools are available for RRBS data processing and analysis, it is not clear which strategy performs the best and there has not been much attention to the contamination issue from artificial cytosines incorporated during the end repair step of library preparation. To address these issues, we describe a new method, Targeted Alignment and Artificial Cytosine Elimination for RRBS (TRACE-RRBS), which aligns bisulfite sequence reads to MSP1 digitally digested reference and specifically removes the end repair cytosines. We compared this approach on a simulated and a real dataset with 7 other RRBS analysis tools and Illumina 450 K microarray platform. Results: TRACE-RRBS aligns sequence reads to a small fraction of the genome where RRBS protocol targets on and was demonstrated as the fastest, most sensitive and specific tool for the simulated dataset. For the real dataset, TRACE-RRBS took about the same time as RRBSMAP, a third to a sixth of time needed for BISMARK and NOVOALIGN. TRACE-RRBS aligned more reads uniquely than other tools and achieved the highest correlation with 450 k microarray data. The end repair artificial cytosine removal increased correlation between nearby CpGs and accuracy of methylation quantification. Conclusions: TRACE-RRBS is fast and more accurate tool for RRBS data analysis. It is freely available for academic use at http://bioinformaticstools.mayo.edu/.

Original languageEnglish (US)
Article number149
JournalBMC Genomics
Volume17
Issue number1
DOIs
StatePublished - Feb 27 2016

Fingerprint

Cytosine
Methylation
DNA Methylation
Merozoite Surface Protein 1
Biological Phenomena
hydrogen sulfite
Epigenomics
Libraries
Genome
Costs and Cost Analysis

Keywords

  • DNA methylation
  • Methylation measure accuracy
  • Reduced representation bisulfite sequencing
  • RRBS
  • RRBS alignment
  • TRACE-RRBS

ASJC Scopus subject areas

  • Biotechnology
  • Genetics

Cite this

Targeted alignment and end repair elimination increase alignment and methylation measure accuracy for reduced representation bisulfite sequencing data. / Baheti, Saurabh; Kanwar, Rahul; Goelzenleuchter, Meike; Kocher, Jean-Pierre; Beutler, Andreas S; Sun, Zhifu D.

In: BMC Genomics, Vol. 17, No. 1, 149, 27.02.2016.

Research output: Contribution to journalArticle

@article{1ff9685acd0d4794802f0d7304379922,
title = "Targeted alignment and end repair elimination increase alignment and methylation measure accuracy for reduced representation bisulfite sequencing data",
abstract = "Background: DNA methylation is an important epigenetic modification involved in many biological processes. Reduced representation bisulfite sequencing (RRBS) is a cost-effective method for studying DNA methylation at single base resolution. Although several tools are available for RRBS data processing and analysis, it is not clear which strategy performs the best and there has not been much attention to the contamination issue from artificial cytosines incorporated during the end repair step of library preparation. To address these issues, we describe a new method, Targeted Alignment and Artificial Cytosine Elimination for RRBS (TRACE-RRBS), which aligns bisulfite sequence reads to MSP1 digitally digested reference and specifically removes the end repair cytosines. We compared this approach on a simulated and a real dataset with 7 other RRBS analysis tools and Illumina 450 K microarray platform. Results: TRACE-RRBS aligns sequence reads to a small fraction of the genome where RRBS protocol targets on and was demonstrated as the fastest, most sensitive and specific tool for the simulated dataset. For the real dataset, TRACE-RRBS took about the same time as RRBSMAP, a third to a sixth of time needed for BISMARK and NOVOALIGN. TRACE-RRBS aligned more reads uniquely than other tools and achieved the highest correlation with 450 k microarray data. The end repair artificial cytosine removal increased correlation between nearby CpGs and accuracy of methylation quantification. Conclusions: TRACE-RRBS is fast and more accurate tool for RRBS data analysis. It is freely available for academic use at http://bioinformaticstools.mayo.edu/.",
keywords = "DNA methylation, Methylation measure accuracy, Reduced representation bisulfite sequencing, RRBS, RRBS alignment, TRACE-RRBS",
author = "Saurabh Baheti and Rahul Kanwar and Meike Goelzenleuchter and Jean-Pierre Kocher and Beutler, {Andreas S} and Sun, {Zhifu D}",
year = "2016",
month = "2",
day = "27",
doi = "10.1186/s12864-016-2494-8",
language = "English (US)",
volume = "17",
journal = "BMC Genomics",
issn = "1471-2164",
publisher = "BioMed Central",
number = "1",

}

TY - JOUR

T1 - Targeted alignment and end repair elimination increase alignment and methylation measure accuracy for reduced representation bisulfite sequencing data

AU - Baheti, Saurabh

AU - Kanwar, Rahul

AU - Goelzenleuchter, Meike

AU - Kocher, Jean-Pierre

AU - Beutler, Andreas S

AU - Sun, Zhifu D

PY - 2016/2/27

Y1 - 2016/2/27

N2 - Background: DNA methylation is an important epigenetic modification involved in many biological processes. Reduced representation bisulfite sequencing (RRBS) is a cost-effective method for studying DNA methylation at single base resolution. Although several tools are available for RRBS data processing and analysis, it is not clear which strategy performs the best and there has not been much attention to the contamination issue from artificial cytosines incorporated during the end repair step of library preparation. To address these issues, we describe a new method, Targeted Alignment and Artificial Cytosine Elimination for RRBS (TRACE-RRBS), which aligns bisulfite sequence reads to MSP1 digitally digested reference and specifically removes the end repair cytosines. We compared this approach on a simulated and a real dataset with 7 other RRBS analysis tools and Illumina 450 K microarray platform. Results: TRACE-RRBS aligns sequence reads to a small fraction of the genome where RRBS protocol targets on and was demonstrated as the fastest, most sensitive and specific tool for the simulated dataset. For the real dataset, TRACE-RRBS took about the same time as RRBSMAP, a third to a sixth of time needed for BISMARK and NOVOALIGN. TRACE-RRBS aligned more reads uniquely than other tools and achieved the highest correlation with 450 k microarray data. The end repair artificial cytosine removal increased correlation between nearby CpGs and accuracy of methylation quantification. Conclusions: TRACE-RRBS is fast and more accurate tool for RRBS data analysis. It is freely available for academic use at http://bioinformaticstools.mayo.edu/.

AB - Background: DNA methylation is an important epigenetic modification involved in many biological processes. Reduced representation bisulfite sequencing (RRBS) is a cost-effective method for studying DNA methylation at single base resolution. Although several tools are available for RRBS data processing and analysis, it is not clear which strategy performs the best and there has not been much attention to the contamination issue from artificial cytosines incorporated during the end repair step of library preparation. To address these issues, we describe a new method, Targeted Alignment and Artificial Cytosine Elimination for RRBS (TRACE-RRBS), which aligns bisulfite sequence reads to MSP1 digitally digested reference and specifically removes the end repair cytosines. We compared this approach on a simulated and a real dataset with 7 other RRBS analysis tools and Illumina 450 K microarray platform. Results: TRACE-RRBS aligns sequence reads to a small fraction of the genome where RRBS protocol targets on and was demonstrated as the fastest, most sensitive and specific tool for the simulated dataset. For the real dataset, TRACE-RRBS took about the same time as RRBSMAP, a third to a sixth of time needed for BISMARK and NOVOALIGN. TRACE-RRBS aligned more reads uniquely than other tools and achieved the highest correlation with 450 k microarray data. The end repair artificial cytosine removal increased correlation between nearby CpGs and accuracy of methylation quantification. Conclusions: TRACE-RRBS is fast and more accurate tool for RRBS data analysis. It is freely available for academic use at http://bioinformaticstools.mayo.edu/.

KW - DNA methylation

KW - Methylation measure accuracy

KW - Reduced representation bisulfite sequencing

KW - RRBS

KW - RRBS alignment

KW - TRACE-RRBS

UR - http://www.scopus.com/inward/record.url?scp=84959238796&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=84959238796&partnerID=8YFLogxK

U2 - 10.1186/s12864-016-2494-8

DO - 10.1186/s12864-016-2494-8

M3 - Article

VL - 17

JO - BMC Genomics

JF - BMC Genomics

SN - 1471-2164

IS - 1

M1 - 149

ER -