Indel sensitive and comprehensive variant/mutation detection from RNA sequencing data for precision medicine 06 Biological Sciences 0604 Genetics

Naresh Prodduturi, Aditya Bhagwate, Jean-Pierre Kocher, Zhifu D Sun

Research output: Contribution to journalArticle

Abstract

Background: RNA-seq is the most commonly used sequencing application. Not only does it measure gene expression but it is also an excellent media to detect important structural variants such as single nucleotide variants (SNVs), insertion/deletion (Indels) or fusion transcripts. However, detection of these variants is challenging and complex from RNA-seq. Here we describe a sensitive and accurate analytical pipeline which detects various mutations at once for translational precision medicine. Methods: The pipeline incorporates most sensitive aligners for Indels in RNA-Seq, the best practice for data preprocessing and variant calling, and STAR-fusion is for chimeric transcripts. Variants/mutations are annotated, and key genes can be extracted for further investigation and clinical actions. Three datasets were used to evaluate the performance of the pipeline for SNVs, indels and fusion transcripts. Results: For the well-defined variants from NA12878 by GIAB project, about 95% and 80% of sensitivities were obtained for SNVs and indels, respectively, in matching RNA-seq. Comparison with other variant specific tools showed good performance of the pipeline. For the lung cancer dataset with 41 known and oncogenic mutations, 39 were detected by the pipeline with STAR aligner and all by the GSNAP aligner. An actionable EML4 and ALK fusion was also detected in one of the tumors, which also demonstrated outlier ALK expression. For 9 fusions spiked-into RNA-seq libraries with different concentrations, the pipeline was able to detect all in unfiltered results although some at very low concentrations may be missed when filtering was applied. Conclusions: The new RNA-seq workflow is an accurate and comprehensive mutation profiler from RNA-seq. Key or actionable mutations are reliably detected from RNA-seq, which makes it a practical alternative source for personalized medicine.

Original languageEnglish (US)
Article number67
JournalBMC Medical Genomics
Volume11
DOIs
StatePublished - Sep 14 2018

Fingerprint

RNA Sequence Analysis
Precision Medicine
Biological Science Disciplines
RNA
Mutation
Nucleotides
Translational Medical Research
Workflow
Practice Guidelines
Libraries
Lung Neoplasms
Gene Expression

Keywords

  • Fusion transcript
  • Gene expression
  • Insertion/deletion
  • Precision medicine
  • RNA sequencing
  • Somatic mutations
  • Targeted therapy

ASJC Scopus subject areas

  • Genetics
  • Genetics(clinical)

Cite this

@article{afed08e949d941b6b4b8fdc9bdb5ffc4,
title = "Indel sensitive and comprehensive variant/mutation detection from RNA sequencing data for precision medicine 06 Biological Sciences 0604 Genetics",
abstract = "Background: RNA-seq is the most commonly used sequencing application. Not only does it measure gene expression but it is also an excellent media to detect important structural variants such as single nucleotide variants (SNVs), insertion/deletion (Indels) or fusion transcripts. However, detection of these variants is challenging and complex from RNA-seq. Here we describe a sensitive and accurate analytical pipeline which detects various mutations at once for translational precision medicine. Methods: The pipeline incorporates most sensitive aligners for Indels in RNA-Seq, the best practice for data preprocessing and variant calling, and STAR-fusion is for chimeric transcripts. Variants/mutations are annotated, and key genes can be extracted for further investigation and clinical actions. Three datasets were used to evaluate the performance of the pipeline for SNVs, indels and fusion transcripts. Results: For the well-defined variants from NA12878 by GIAB project, about 95{\%} and 80{\%} of sensitivities were obtained for SNVs and indels, respectively, in matching RNA-seq. Comparison with other variant specific tools showed good performance of the pipeline. For the lung cancer dataset with 41 known and oncogenic mutations, 39 were detected by the pipeline with STAR aligner and all by the GSNAP aligner. An actionable EML4 and ALK fusion was also detected in one of the tumors, which also demonstrated outlier ALK expression. For 9 fusions spiked-into RNA-seq libraries with different concentrations, the pipeline was able to detect all in unfiltered results although some at very low concentrations may be missed when filtering was applied. Conclusions: The new RNA-seq workflow is an accurate and comprehensive mutation profiler from RNA-seq. Key or actionable mutations are reliably detected from RNA-seq, which makes it a practical alternative source for personalized medicine.",
keywords = "Fusion transcript, Gene expression, Insertion/deletion, Precision medicine, RNA sequencing, Somatic mutations, Targeted therapy",
author = "Naresh Prodduturi and Aditya Bhagwate and Jean-Pierre Kocher and Sun, {Zhifu D}",
year = "2018",
month = "9",
day = "14",
doi = "10.1186/s12920-018-0391-5",
language = "English (US)",
volume = "11",
journal = "BMC Medical Genomics",
issn = "1755-8794",
publisher = "BioMed Central",

}

TY - JOUR

T1 - Indel sensitive and comprehensive variant/mutation detection from RNA sequencing data for precision medicine 06 Biological Sciences 0604 Genetics

AU - Prodduturi, Naresh

AU - Bhagwate, Aditya

AU - Kocher, Jean-Pierre

AU - Sun, Zhifu D

PY - 2018/9/14

Y1 - 2018/9/14

N2 - Background: RNA-seq is the most commonly used sequencing application. Not only does it measure gene expression but it is also an excellent media to detect important structural variants such as single nucleotide variants (SNVs), insertion/deletion (Indels) or fusion transcripts. However, detection of these variants is challenging and complex from RNA-seq. Here we describe a sensitive and accurate analytical pipeline which detects various mutations at once for translational precision medicine. Methods: The pipeline incorporates most sensitive aligners for Indels in RNA-Seq, the best practice for data preprocessing and variant calling, and STAR-fusion is for chimeric transcripts. Variants/mutations are annotated, and key genes can be extracted for further investigation and clinical actions. Three datasets were used to evaluate the performance of the pipeline for SNVs, indels and fusion transcripts. Results: For the well-defined variants from NA12878 by GIAB project, about 95% and 80% of sensitivities were obtained for SNVs and indels, respectively, in matching RNA-seq. Comparison with other variant specific tools showed good performance of the pipeline. For the lung cancer dataset with 41 known and oncogenic mutations, 39 were detected by the pipeline with STAR aligner and all by the GSNAP aligner. An actionable EML4 and ALK fusion was also detected in one of the tumors, which also demonstrated outlier ALK expression. For 9 fusions spiked-into RNA-seq libraries with different concentrations, the pipeline was able to detect all in unfiltered results although some at very low concentrations may be missed when filtering was applied. Conclusions: The new RNA-seq workflow is an accurate and comprehensive mutation profiler from RNA-seq. Key or actionable mutations are reliably detected from RNA-seq, which makes it a practical alternative source for personalized medicine.

AB - Background: RNA-seq is the most commonly used sequencing application. Not only does it measure gene expression but it is also an excellent media to detect important structural variants such as single nucleotide variants (SNVs), insertion/deletion (Indels) or fusion transcripts. However, detection of these variants is challenging and complex from RNA-seq. Here we describe a sensitive and accurate analytical pipeline which detects various mutations at once for translational precision medicine. Methods: The pipeline incorporates most sensitive aligners for Indels in RNA-Seq, the best practice for data preprocessing and variant calling, and STAR-fusion is for chimeric transcripts. Variants/mutations are annotated, and key genes can be extracted for further investigation and clinical actions. Three datasets were used to evaluate the performance of the pipeline for SNVs, indels and fusion transcripts. Results: For the well-defined variants from NA12878 by GIAB project, about 95% and 80% of sensitivities were obtained for SNVs and indels, respectively, in matching RNA-seq. Comparison with other variant specific tools showed good performance of the pipeline. For the lung cancer dataset with 41 known and oncogenic mutations, 39 were detected by the pipeline with STAR aligner and all by the GSNAP aligner. An actionable EML4 and ALK fusion was also detected in one of the tumors, which also demonstrated outlier ALK expression. For 9 fusions spiked-into RNA-seq libraries with different concentrations, the pipeline was able to detect all in unfiltered results although some at very low concentrations may be missed when filtering was applied. Conclusions: The new RNA-seq workflow is an accurate and comprehensive mutation profiler from RNA-seq. Key or actionable mutations are reliably detected from RNA-seq, which makes it a practical alternative source for personalized medicine.

KW - Fusion transcript

KW - Gene expression

KW - Insertion/deletion

KW - Precision medicine

KW - RNA sequencing

KW - Somatic mutations

KW - Targeted therapy

UR - http://www.scopus.com/inward/record.url?scp=85053352579&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=85053352579&partnerID=8YFLogxK

U2 - 10.1186/s12920-018-0391-5

DO - 10.1186/s12920-018-0391-5

M3 - Article

VL - 11

JO - BMC Medical Genomics

JF - BMC Medical Genomics

SN - 1755-8794

M1 - 67

ER -