A novel bioinformatics pipeline for identification and characterization of fusion transcripts in breast cancer and normal cell lines

Yan W. Asmann, Asif Hossain, Brian M. Necela, Sumit Middha, Krishna R. Kalari, Zhifu Sun, High Seng Chai, David W. Williamson, Derek Radisky, Gary P. Schroth, Jean Pierre A. Kocher, Edith A. Perez, E. Aubrey Thompson

Research output: Contribution to journalArticle

69 Scopus citations

Abstract

SnowShoes-FTD, developed for fusion transcript detection in paired-end mRNA-Seq data, employs multiple steps of false positive filtering to nominate fusion transcripts with near 100% confidence. Unique features include: (i) identification of multiple fusion isoforms from two gene partners; (ii) prediction of genomic rearrangements; (iii) identification of exon fusion boundaries; (iv) generation of a 50-30 fusion spanning sequence for PCR validation; and (v) prediction of the protein sequences, including frame shift and amino acid insertions. We applied SnowShoes-FTD to identify 50 fusion candidates in 22 breast cancer and 9 nontransformed cell lines. Five additional fusion candidates with two isoforms were confirmed. In all, 30 of 55 fusion candidates had in-frame protein products. No fusion transcripts were detected in nontransformed cells. Consideration of the possible functions of a subset of predicted fusion proteins suggests several potentially important functions in transformation, including a possible new mechanism for overexpression of ERBB2 in a HER-positive cell line. The source code of SnowShoes-FTD is provided in two formats: one configured to run on the Sun Grid Engine for parallelization, and the other formatted to run on a single LINUX node. Executables in PERL are available for download from our web site: http://mayoresearch.mayo.edu/ mayo/research/biostat/stand-alone-packages.cfm.

Original languageEnglish (US)
Pages (from-to)e100
JournalNucleic acids research
Volume39
Issue number15
DOIs
StatePublished - Aug 2011

ASJC Scopus subject areas

  • Genetics

Fingerprint Dive into the research topics of 'A novel bioinformatics pipeline for identification and characterization of fusion transcripts in breast cancer and normal cell lines'. Together they form a unique fingerprint.

  • Cite this