Characterization and identification of long non-coding RNAs based on feature relationship

Guangyu Wang, Hongyan Yin, Boyang Li, Chunlei Yu, Fan Wang, Xingjian Xu, Jiabao Cao, Yiming Bao, Liguo Wang, Amir A. Abbasi, Vladimir B. Bajic, Lina Ma, Zhang Zhang

Research output: Contribution to journalArticle

6 Scopus citations

Abstract

Motivation: The significance of long non-coding RNAs (lncRNAs) in many biological processes and diseases has gained intense interests over the past several years. However, computational identification of lncRNAs in a wide range of species remains challenging; it requires prior knowledge of well-established sequences and annotations or species-specific training data, but the reality is that only a limited number of species have high-quality sequences and annotations. Results: Here we first characterize lncRNAs in contrast to protein-coding RNAs based on feature relationship and find that the feature relationship between open reading frame length and guaninecytosine (GC) content presents universally substantial divergence in lncRNAs and protein-coding RNAs, as observed in a broad variety of species. Based on the feature relationship, accordingly, we further present LGC, a novel algorithm for identifying lncRNAs that is able to accurately distinguish lncRNAs from protein-coding RNAs in a cross-species manner without any prior knowledge. As validated on large-scale empirical datasets, comparative results show that LGC outperforms existing algorithms by achieving higher accuracy, well-balanced sensitivity and specificity, and is robustly effective (>90% accuracy) in discriminating lncRNAs from protein-coding RNAs across diverse species that range from plants to mammals. To our knowledge, this study, for the first time, differentially characterizes lncRNAs and protein-coding RNAs based on feature relationship, which is further applied in computational identification of lncRNAs. Taken together, our study represents a significant advance in characterization and identification of lncRNAs and LGC thus bears broad potential utility for computational analysis of lncRNAs in a wide range of species.

Original languageEnglish (US)
Pages (from-to)2949-2956
Number of pages8
JournalBioinformatics
Volume35
Issue number17
DOIs
StatePublished - Sep 1 2019

ASJC Scopus subject areas

  • Statistics and Probability
  • Biochemistry
  • Molecular Biology
  • Computer Science Applications
  • Computational Theory and Mathematics
  • Computational Mathematics

Fingerprint Dive into the research topics of 'Characterization and identification of long non-coding RNAs based on feature relationship'. Together they form a unique fingerprint.

  • Cite this

    Wang, G., Yin, H., Li, B., Yu, C., Wang, F., Xu, X., Cao, J., Bao, Y., Wang, L., Abbasi, A. A., Bajic, V. B., Ma, L., & Zhang, Z. (2019). Characterization and identification of long non-coding RNAs based on feature relationship. Bioinformatics, 35(17), 2949-2956. https://doi.org/10.1093/bioinformatics/btz008