Assessment of mapping and SNP-detection algorithms for next-generation sequencing data in cancer genomics

Weixin Wang, Feng Xu, Junwen Wang

Research output: Chapter in Book/Report/Conference proceedingChapter

1 Citation (Scopus)

Abstract

​​​​The rapid development of next-generation sequencing (NGS) technology provides a new chance to extend the scale and resolution of genomic research. How to efficiently map millions of short reads to the reference genome and how to make accurate variant call are two major challenges in NGS analysis. In this chapter, we review current software for aligning short reads and detecting single-nucleotide polymorphisms (SNPs) and extensively evaluate their performance on normal and cancer samples from the Cancer Genome Atlas project and trio’s data from the 1000 Genomes Project. We find that Burrows-Wheeler transform-based aligners are proven to be the most suitable for Illumina platform, and NovoalignCS shows the best overall performance for SOLiD data. We also demonstrate FaSD as the most reliable SNP caller compared with several state-of-the-art programs. Furthermore, NGS shows significantly lower coverage and poorer SNP-calling performance in the CpG island, promoter, and 5’UTR regions of the human genome. We show that both high GC-content and low repetitive elements are the causes of lower coverage in the promoter regions.

Original languageEnglish (US)
Title of host publicationNext Generation Sequencing in Cancer Research: Volume 1: Decoding the Cancer Genome
PublisherSpringer New York
Pages301-317
Number of pages17
ISBN (Electronic)9781461476450
ISBN (Print)9781461476443
DOIs
StatePublished - Jan 1 2013
Externally publishedYes

Fingerprint

Genomics
Single Nucleotide Polymorphism
Genome
Genetic Promoter Regions
Neoplasms
CpG Islands
Atlases
Base Composition
Human Genome
Software
Technology
Research

Keywords

  • Alignment
  • Cancer
  • Genotype
  • Next-generation sequencing
  • SNP

ASJC Scopus subject areas

  • Medicine(all)

Cite this

Wang, W., Xu, F., & Wang, J. (2013). Assessment of mapping and SNP-detection algorithms for next-generation sequencing data in cancer genomics. In Next Generation Sequencing in Cancer Research: Volume 1: Decoding the Cancer Genome (pp. 301-317). Springer New York. https://doi.org/10.1007/978-1-4614-7645-0_15

Assessment of mapping and SNP-detection algorithms for next-generation sequencing data in cancer genomics. / Wang, Weixin; Xu, Feng; Wang, Junwen.

Next Generation Sequencing in Cancer Research: Volume 1: Decoding the Cancer Genome. Springer New York, 2013. p. 301-317.

Research output: Chapter in Book/Report/Conference proceedingChapter

Wang, W, Xu, F & Wang, J 2013, Assessment of mapping and SNP-detection algorithms for next-generation sequencing data in cancer genomics. in Next Generation Sequencing in Cancer Research: Volume 1: Decoding the Cancer Genome. Springer New York, pp. 301-317. https://doi.org/10.1007/978-1-4614-7645-0_15
Wang W, Xu F, Wang J. Assessment of mapping and SNP-detection algorithms for next-generation sequencing data in cancer genomics. In Next Generation Sequencing in Cancer Research: Volume 1: Decoding the Cancer Genome. Springer New York. 2013. p. 301-317 https://doi.org/10.1007/978-1-4614-7645-0_15
Wang, Weixin ; Xu, Feng ; Wang, Junwen. / Assessment of mapping and SNP-detection algorithms for next-generation sequencing data in cancer genomics. Next Generation Sequencing in Cancer Research: Volume 1: Decoding the Cancer Genome. Springer New York, 2013. pp. 301-317
@inbook{f2c5f6bab9d646e6bd7afd375cbf9f9a,
title = "Assessment of mapping and SNP-detection algorithms for next-generation sequencing data in cancer genomics",
abstract = "​​​​The rapid development of next-generation sequencing (NGS) technology provides a new chance to extend the scale and resolution of genomic research. How to efficiently map millions of short reads to the reference genome and how to make accurate variant call are two major challenges in NGS analysis. In this chapter, we review current software for aligning short reads and detecting single-nucleotide polymorphisms (SNPs) and extensively evaluate their performance on normal and cancer samples from the Cancer Genome Atlas project and trio’s data from the 1000 Genomes Project. We find that Burrows-Wheeler transform-based aligners are proven to be the most suitable for Illumina platform, and NovoalignCS shows the best overall performance for SOLiD data. We also demonstrate FaSD as the most reliable SNP caller compared with several state-of-the-art programs. Furthermore, NGS shows significantly lower coverage and poorer SNP-calling performance in the CpG island, promoter, and 5’UTR regions of the human genome. We show that both high GC-content and low repetitive elements are the causes of lower coverage in the promoter regions.",
keywords = "Alignment, Cancer, Genotype, Next-generation sequencing, SNP",
author = "Weixin Wang and Feng Xu and Junwen Wang",
year = "2013",
month = "1",
day = "1",
doi = "10.1007/978-1-4614-7645-0_15",
language = "English (US)",
isbn = "9781461476443",
pages = "301--317",
booktitle = "Next Generation Sequencing in Cancer Research: Volume 1: Decoding the Cancer Genome",
publisher = "Springer New York",

}

TY - CHAP

T1 - Assessment of mapping and SNP-detection algorithms for next-generation sequencing data in cancer genomics

AU - Wang, Weixin

AU - Xu, Feng

AU - Wang, Junwen

PY - 2013/1/1

Y1 - 2013/1/1

N2 - ​​​​The rapid development of next-generation sequencing (NGS) technology provides a new chance to extend the scale and resolution of genomic research. How to efficiently map millions of short reads to the reference genome and how to make accurate variant call are two major challenges in NGS analysis. In this chapter, we review current software for aligning short reads and detecting single-nucleotide polymorphisms (SNPs) and extensively evaluate their performance on normal and cancer samples from the Cancer Genome Atlas project and trio’s data from the 1000 Genomes Project. We find that Burrows-Wheeler transform-based aligners are proven to be the most suitable for Illumina platform, and NovoalignCS shows the best overall performance for SOLiD data. We also demonstrate FaSD as the most reliable SNP caller compared with several state-of-the-art programs. Furthermore, NGS shows significantly lower coverage and poorer SNP-calling performance in the CpG island, promoter, and 5’UTR regions of the human genome. We show that both high GC-content and low repetitive elements are the causes of lower coverage in the promoter regions.

AB - ​​​​The rapid development of next-generation sequencing (NGS) technology provides a new chance to extend the scale and resolution of genomic research. How to efficiently map millions of short reads to the reference genome and how to make accurate variant call are two major challenges in NGS analysis. In this chapter, we review current software for aligning short reads and detecting single-nucleotide polymorphisms (SNPs) and extensively evaluate their performance on normal and cancer samples from the Cancer Genome Atlas project and trio’s data from the 1000 Genomes Project. We find that Burrows-Wheeler transform-based aligners are proven to be the most suitable for Illumina platform, and NovoalignCS shows the best overall performance for SOLiD data. We also demonstrate FaSD as the most reliable SNP caller compared with several state-of-the-art programs. Furthermore, NGS shows significantly lower coverage and poorer SNP-calling performance in the CpG island, promoter, and 5’UTR regions of the human genome. We show that both high GC-content and low repetitive elements are the causes of lower coverage in the promoter regions.

KW - Alignment

KW - Cancer

KW - Genotype

KW - Next-generation sequencing

KW - SNP

UR - http://www.scopus.com/inward/record.url?scp=84948109279&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=84948109279&partnerID=8YFLogxK

U2 - 10.1007/978-1-4614-7645-0_15

DO - 10.1007/978-1-4614-7645-0_15

M3 - Chapter

SN - 9781461476443

SP - 301

EP - 317

BT - Next Generation Sequencing in Cancer Research: Volume 1: Decoding the Cancer Genome

PB - Springer New York

ER -