TY - JOUR
T1 - Gene and alternative splicing annotation with AIR
AU - Florea, Liliana
AU - Di Francesco, Valentina
AU - Miller, Jason
AU - Turner, Russell
AU - Yao, Alison
AU - Harris, Michael
AU - Walenz, Brian
AU - Mobarry, Clark
AU - Merkulov, Gennady V.
AU - Charlab, Rosane
AU - Dew, Ian
AU - Deng, Zuoming
AU - Istrail, Sorin
AU - Li, Peter
AU - Sutton, Granger
PY - 2005/1
Y1 - 2005/1
N2 - Designing effective and accurate tools for identifying the functional and structural elements in a genome remains at the frontier of genome annotation owing to incompleteness and inaccuracy of the data, limitations in the computational models, and shifting paradigms in genomics, such as alternative splicing. We present a methodology for the automated annotation of genes and their alternatively spliced mRNA transcripts based on existing cDNA and protein sequence evidence from the same species or projected from a related species using syntenic mapping information. At the core of the method is the splice graph, a compact representation of a gene, its exons, introns, and alternatively spliced isoforms. The putative transcripts are enumerated from the graph and assigned confidence scores based on the strength of sequence evidence, and a subset of the high-scoring candidates are selected and promoted into the annotation. The method is highly selective, eliminating the unlikely candidates while retaining 98% of the high-quality mRNA evidence in well-formed transcripts, and produces annotation that is measurably more accurate than some evidence-based gene sets. The process is fast, accurate, and fully automated, and combines the traditionally distinct gene annotation and alternative splicing detection processes in a comprehensive and systematic way, thus considerably aiding in the ensuing manual curation efforts.
AB - Designing effective and accurate tools for identifying the functional and structural elements in a genome remains at the frontier of genome annotation owing to incompleteness and inaccuracy of the data, limitations in the computational models, and shifting paradigms in genomics, such as alternative splicing. We present a methodology for the automated annotation of genes and their alternatively spliced mRNA transcripts based on existing cDNA and protein sequence evidence from the same species or projected from a related species using syntenic mapping information. At the core of the method is the splice graph, a compact representation of a gene, its exons, introns, and alternatively spliced isoforms. The putative transcripts are enumerated from the graph and assigned confidence scores based on the strength of sequence evidence, and a subset of the high-scoring candidates are selected and promoted into the annotation. The method is highly selective, eliminating the unlikely candidates while retaining 98% of the high-quality mRNA evidence in well-formed transcripts, and produces annotation that is measurably more accurate than some evidence-based gene sets. The process is fast, accurate, and fully automated, and combines the traditionally distinct gene annotation and alternative splicing detection processes in a comprehensive and systematic way, thus considerably aiding in the ensuing manual curation efforts.
UR - http://www.scopus.com/inward/record.url?scp=19944433052&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=19944433052&partnerID=8YFLogxK
U2 - 10.1101/gr.2889405
DO - 10.1101/gr.2889405
M3 - Article
C2 - 15632090
AN - SCOPUS:19944433052
SN - 1088-9051
VL - 15
SP - 54
EP - 66
JO - Genome Research
JF - Genome Research
IS - 1
ER -