RNA Coding Potential Prediction Using Alignment-Free Logistic Regression Model

Ying Li, Liguo Wang

Research output: Chapter in Book/Report/Conference proceedingChapter

Abstract

CPAT (Coding-Potential Assessment Tool) is a logistic regression model–based classifier that can accurately and quickly distinguish protein-coding and noncoding RNAs using pure linguistic features calculated from the RNA sequences. CPAT takes as input the nucleotides sequences or genomic coordinates of RNAs and outputs the probabilities p (0 ≤ p ≤ 1), which measure the likelihood of protein coding. Users can run CPAT online (http://lilab.research.bcm.edu/cpat/ ) or from the local computers after installation. CPAT provides prebuilt logistic models to recognize RNAs originated from human (Homo sapiens), mouse (Mus musculus), zebrafish (Danio rerio), and fly (Drosophila melanogaster) genomes. Instructions on how to train models for other genomes are described in CPAT website (http://rna-cpat.sourceforge.net/ ) and this chapter.

Original languageEnglish (US)
Title of host publicationMethods in Molecular Biology
PublisherHumana Press Inc.
Pages27-39
Number of pages13
DOIs
StatePublished - 2021

Publication series

NameMethods in Molecular Biology
Volume2254
ISSN (Print)1064-3745
ISSN (Electronic)1940-6029

Keywords

  • LincRNA
  • LncRNA
  • Logistic regression
  • Noncoding RNA
  • Prediction
  • Protein coding

ASJC Scopus subject areas

  • Molecular Biology
  • Genetics

Fingerprint Dive into the research topics of 'RNA Coding Potential Prediction Using Alignment-Free Logistic Regression Model'. Together they form a unique fingerprint.

Cite this