Deep sequencing of transcriptome (RNA-seq) provides unprecedented opportunity to interrogate plausible mRNA splicing patterns by mapping RNA-seq reads to exon junctions (thereafter junction reads). In most previous studies, exon junctions were detected by using the quantitative information of junction reads. The quantitative criterion (e.g. minimum of two junction reads), although is straightforward and widely used, usually results in high false positive and false negative rates, owning to the complexity of transcriptome. Here, we introduced a new metric, namely Minimal Match on Either Side of exon junction (MMES), to measure the quality of each junction read, and subsequently implemented an empirical statistical model to detect exon junctions. When applied to a large dataset (>200M reads) consisting of mouse brain, liver and muscle mRNA sequences, and using independent transcripts databases as positive control, our method was proved to be considerably more accurate than previous ones, especially for detecting junctions originated from low-abundance transcripts. Our results were also confirmed by real time RT-PCR assay. The MMES metric can be used either in this empirical statistical model or in other more sophisticated classifiers, such as logistic regression.
ASJC Scopus subject areas