The eSNV-detect: A computational system to identify expressed single nucleotide variants from transcriptome sequencing data

Xiaojia Tang, Saurabh Baheti, Khader Shameer, Kevin J. Thompson, Quin Wills, Nifang Niu, Ilona N. Holcomb, Stephane C. Boutet, Ramesh Ramakrishnan, Jennifer M. Kachergus, Jean-Pierre Kocher, Richard M Weinshilboum, Liewei M Wang, E Aubrey Thompson, Krishna R Kalari

Research output: Contribution to journalArticle

20 Citations (Scopus)

Abstract

Rapid development of next generation sequencing technology has enabled the identification of genomic alterations from short sequencing reads. There are a number of software pipelines available for calling single nucleotide variants from genomic DNA but, no comprehensive pipelines to identify, annotate and prioritize expressed SNVs (eSNVs) from non-directional paired-end RNA-Seq data. We have developed the eSNV-Detect, a novel computational system, which utilizes data from multiple aligners to call, even at low read depths, and rank variants from RNA-Seq. Multi-platform comparisons with the eSNV-Detect variant candidates were performed. The method was first applied to RNA-Seq from a lymphoblastoid cell-line, achieving 99.7% precision and 91.0% sensitivity in the expressed SNPs for the matching HumanOmni2.5 BeadChip data. Comparison of RNA-Seq eSNV candidates from 25 ER+ breast tumors from The Cancer Genome Atlas (TCGA) project with whole exome coding data showed 90.6-96.8% precision and 91.6-95.7% sensitivity. Contrasting single-cell mRNA-Seq variants with matching traditional multicellular RNA-Seq data for the MD-MB231 breast cancer cell-line delineated variant heterogeneity among the single-cells. Further, Sanger sequencing validation was performed for an ER+ breast tumor with paired normal adjacent tissue validating 29 out of 31 candidate eSNVs. The source code and user manuals of the eSNV-Detect pipeline for Sun Grid Engine and virtual machine are available at http://bioinformaticstools.mayo.edu/research/esnv-detect/.

Original languageEnglish (US)
Article numbere172
JournalNucleic Acids Research
Volume42
Issue number22
DOIs
StatePublished - Dec 16 2014

Fingerprint

Transcriptome
Nucleotides
RNA
Breast Neoplasms
Exome
Cell Line
Atlases
Solar System
Single Nucleotide Polymorphism
Software
Genome
Technology
Messenger RNA
DNA
Research
Neoplasms

ASJC Scopus subject areas

  • Genetics

Cite this

The eSNV-detect : A computational system to identify expressed single nucleotide variants from transcriptome sequencing data. / Tang, Xiaojia; Baheti, Saurabh; Shameer, Khader; Thompson, Kevin J.; Wills, Quin; Niu, Nifang; Holcomb, Ilona N.; Boutet, Stephane C.; Ramakrishnan, Ramesh; Kachergus, Jennifer M.; Kocher, Jean-Pierre; Weinshilboum, Richard M; Wang, Liewei M; Thompson, E Aubrey; Kalari, Krishna R.

In: Nucleic Acids Research, Vol. 42, No. 22, e172, 16.12.2014.

Research output: Contribution to journalArticle

Tang, Xiaojia ; Baheti, Saurabh ; Shameer, Khader ; Thompson, Kevin J. ; Wills, Quin ; Niu, Nifang ; Holcomb, Ilona N. ; Boutet, Stephane C. ; Ramakrishnan, Ramesh ; Kachergus, Jennifer M. ; Kocher, Jean-Pierre ; Weinshilboum, Richard M ; Wang, Liewei M ; Thompson, E Aubrey ; Kalari, Krishna R. / The eSNV-detect : A computational system to identify expressed single nucleotide variants from transcriptome sequencing data. In: Nucleic Acids Research. 2014 ; Vol. 42, No. 22.
@article{61e1b902deff432d946787e73454cf5c,
title = "The eSNV-detect: A computational system to identify expressed single nucleotide variants from transcriptome sequencing data",
abstract = "Rapid development of next generation sequencing technology has enabled the identification of genomic alterations from short sequencing reads. There are a number of software pipelines available for calling single nucleotide variants from genomic DNA but, no comprehensive pipelines to identify, annotate and prioritize expressed SNVs (eSNVs) from non-directional paired-end RNA-Seq data. We have developed the eSNV-Detect, a novel computational system, which utilizes data from multiple aligners to call, even at low read depths, and rank variants from RNA-Seq. Multi-platform comparisons with the eSNV-Detect variant candidates were performed. The method was first applied to RNA-Seq from a lymphoblastoid cell-line, achieving 99.7{\%} precision and 91.0{\%} sensitivity in the expressed SNPs for the matching HumanOmni2.5 BeadChip data. Comparison of RNA-Seq eSNV candidates from 25 ER+ breast tumors from The Cancer Genome Atlas (TCGA) project with whole exome coding data showed 90.6-96.8{\%} precision and 91.6-95.7{\%} sensitivity. Contrasting single-cell mRNA-Seq variants with matching traditional multicellular RNA-Seq data for the MD-MB231 breast cancer cell-line delineated variant heterogeneity among the single-cells. Further, Sanger sequencing validation was performed for an ER+ breast tumor with paired normal adjacent tissue validating 29 out of 31 candidate eSNVs. The source code and user manuals of the eSNV-Detect pipeline for Sun Grid Engine and virtual machine are available at http://bioinformaticstools.mayo.edu/research/esnv-detect/.",
author = "Xiaojia Tang and Saurabh Baheti and Khader Shameer and Thompson, {Kevin J.} and Quin Wills and Nifang Niu and Holcomb, {Ilona N.} and Boutet, {Stephane C.} and Ramesh Ramakrishnan and Kachergus, {Jennifer M.} and Jean-Pierre Kocher and Weinshilboum, {Richard M} and Wang, {Liewei M} and Thompson, {E Aubrey} and Kalari, {Krishna R}",
year = "2014",
month = "12",
day = "16",
doi = "10.1093/nar/gku1005",
language = "English (US)",
volume = "42",
journal = "Nucleic Acids Research",
issn = "0305-1048",
publisher = "Oxford University Press",
number = "22",

}

TY - JOUR

T1 - The eSNV-detect

T2 - A computational system to identify expressed single nucleotide variants from transcriptome sequencing data

AU - Tang, Xiaojia

AU - Baheti, Saurabh

AU - Shameer, Khader

AU - Thompson, Kevin J.

AU - Wills, Quin

AU - Niu, Nifang

AU - Holcomb, Ilona N.

AU - Boutet, Stephane C.

AU - Ramakrishnan, Ramesh

AU - Kachergus, Jennifer M.

AU - Kocher, Jean-Pierre

AU - Weinshilboum, Richard M

AU - Wang, Liewei M

AU - Thompson, E Aubrey

AU - Kalari, Krishna R

PY - 2014/12/16

Y1 - 2014/12/16

N2 - Rapid development of next generation sequencing technology has enabled the identification of genomic alterations from short sequencing reads. There are a number of software pipelines available for calling single nucleotide variants from genomic DNA but, no comprehensive pipelines to identify, annotate and prioritize expressed SNVs (eSNVs) from non-directional paired-end RNA-Seq data. We have developed the eSNV-Detect, a novel computational system, which utilizes data from multiple aligners to call, even at low read depths, and rank variants from RNA-Seq. Multi-platform comparisons with the eSNV-Detect variant candidates were performed. The method was first applied to RNA-Seq from a lymphoblastoid cell-line, achieving 99.7% precision and 91.0% sensitivity in the expressed SNPs for the matching HumanOmni2.5 BeadChip data. Comparison of RNA-Seq eSNV candidates from 25 ER+ breast tumors from The Cancer Genome Atlas (TCGA) project with whole exome coding data showed 90.6-96.8% precision and 91.6-95.7% sensitivity. Contrasting single-cell mRNA-Seq variants with matching traditional multicellular RNA-Seq data for the MD-MB231 breast cancer cell-line delineated variant heterogeneity among the single-cells. Further, Sanger sequencing validation was performed for an ER+ breast tumor with paired normal adjacent tissue validating 29 out of 31 candidate eSNVs. The source code and user manuals of the eSNV-Detect pipeline for Sun Grid Engine and virtual machine are available at http://bioinformaticstools.mayo.edu/research/esnv-detect/.

AB - Rapid development of next generation sequencing technology has enabled the identification of genomic alterations from short sequencing reads. There are a number of software pipelines available for calling single nucleotide variants from genomic DNA but, no comprehensive pipelines to identify, annotate and prioritize expressed SNVs (eSNVs) from non-directional paired-end RNA-Seq data. We have developed the eSNV-Detect, a novel computational system, which utilizes data from multiple aligners to call, even at low read depths, and rank variants from RNA-Seq. Multi-platform comparisons with the eSNV-Detect variant candidates were performed. The method was first applied to RNA-Seq from a lymphoblastoid cell-line, achieving 99.7% precision and 91.0% sensitivity in the expressed SNPs for the matching HumanOmni2.5 BeadChip data. Comparison of RNA-Seq eSNV candidates from 25 ER+ breast tumors from The Cancer Genome Atlas (TCGA) project with whole exome coding data showed 90.6-96.8% precision and 91.6-95.7% sensitivity. Contrasting single-cell mRNA-Seq variants with matching traditional multicellular RNA-Seq data for the MD-MB231 breast cancer cell-line delineated variant heterogeneity among the single-cells. Further, Sanger sequencing validation was performed for an ER+ breast tumor with paired normal adjacent tissue validating 29 out of 31 candidate eSNVs. The source code and user manuals of the eSNV-Detect pipeline for Sun Grid Engine and virtual machine are available at http://bioinformaticstools.mayo.edu/research/esnv-detect/.

UR - http://www.scopus.com/inward/record.url?scp=84924312533&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=84924312533&partnerID=8YFLogxK

U2 - 10.1093/nar/gku1005

DO - 10.1093/nar/gku1005

M3 - Article

C2 - 25352556

AN - SCOPUS:84924312533

VL - 42

JO - Nucleic Acids Research

JF - Nucleic Acids Research

SN - 0305-1048

IS - 22

M1 - e172

ER -