Evaluating eukaryotic secreted protein prediction

Eric W Klee, Lynda B M Ellis

Research output: Contribution to journalArticle

56 Citations (Scopus)

Abstract

Background: Improvements in protein sequence annotation and an increase in the number of annotated protein databases has fueled development of an increasing number of software tools to predict secreted proteins. Six software programs capable of high throughput and employing a wide range of prediction methods, SignalP 3.0, SignalP 2.0, TargetP 1.01, PrediSi, Phobius, and ProtComp 6.0, are evaluated. Results: Prediction accuracies were evaluated using 372 unbiased, eukaryotic, SwissProt protein sequences. TargetP, SignalP 3.0 maximum S-score and SignalP 3.0 D-score were the most accurate single scores (90-91% accurate). The combination of a positive TargetP prediction, SignalP 2.0 maximum Y-score, and SignalP 3.0 maximum S-score increased accuracy by six percent. Conclusions: Single predictive scores could be highly accurate, but almost all accuracies were slightly less than those reported by program authors. Predictive accuracy could be substantially improved by combining scores from multiple methods into a single composite prediction.

Original languageEnglish (US)
Article number256
JournalBMC Bioinformatics
Volume6
DOIs
StatePublished - Oct 14 2005
Externally publishedYes

Fingerprint

Protein Databases
Software
Proteins
Protein
Molecular Sequence Annotation
Prediction
Protein Sequence
Throughput
Software Tools
Composite materials
Percent
High Throughput
Annotation
Composite
Predict
Range of data

ASJC Scopus subject areas

  • Medicine(all)
  • Structural Biology
  • Applied Mathematics

Cite this

Evaluating eukaryotic secreted protein prediction. / Klee, Eric W; Ellis, Lynda B M.

In: BMC Bioinformatics, Vol. 6, 256, 14.10.2005.

Research output: Contribution to journalArticle

@article{2180600c05c54bc69dad40aef8e17293,
title = "Evaluating eukaryotic secreted protein prediction",
abstract = "Background: Improvements in protein sequence annotation and an increase in the number of annotated protein databases has fueled development of an increasing number of software tools to predict secreted proteins. Six software programs capable of high throughput and employing a wide range of prediction methods, SignalP 3.0, SignalP 2.0, TargetP 1.01, PrediSi, Phobius, and ProtComp 6.0, are evaluated. Results: Prediction accuracies were evaluated using 372 unbiased, eukaryotic, SwissProt protein sequences. TargetP, SignalP 3.0 maximum S-score and SignalP 3.0 D-score were the most accurate single scores (90-91{\%} accurate). The combination of a positive TargetP prediction, SignalP 2.0 maximum Y-score, and SignalP 3.0 maximum S-score increased accuracy by six percent. Conclusions: Single predictive scores could be highly accurate, but almost all accuracies were slightly less than those reported by program authors. Predictive accuracy could be substantially improved by combining scores from multiple methods into a single composite prediction.",
author = "Klee, {Eric W} and Ellis, {Lynda B M}",
year = "2005",
month = "10",
day = "14",
doi = "10.1186/1471-2105-6-256",
language = "English (US)",
volume = "6",
journal = "BMC Bioinformatics",
issn = "1471-2105",
publisher = "BioMed Central",

}

TY - JOUR

T1 - Evaluating eukaryotic secreted protein prediction

AU - Klee, Eric W

AU - Ellis, Lynda B M

PY - 2005/10/14

Y1 - 2005/10/14

N2 - Background: Improvements in protein sequence annotation and an increase in the number of annotated protein databases has fueled development of an increasing number of software tools to predict secreted proteins. Six software programs capable of high throughput and employing a wide range of prediction methods, SignalP 3.0, SignalP 2.0, TargetP 1.01, PrediSi, Phobius, and ProtComp 6.0, are evaluated. Results: Prediction accuracies were evaluated using 372 unbiased, eukaryotic, SwissProt protein sequences. TargetP, SignalP 3.0 maximum S-score and SignalP 3.0 D-score were the most accurate single scores (90-91% accurate). The combination of a positive TargetP prediction, SignalP 2.0 maximum Y-score, and SignalP 3.0 maximum S-score increased accuracy by six percent. Conclusions: Single predictive scores could be highly accurate, but almost all accuracies were slightly less than those reported by program authors. Predictive accuracy could be substantially improved by combining scores from multiple methods into a single composite prediction.

AB - Background: Improvements in protein sequence annotation and an increase in the number of annotated protein databases has fueled development of an increasing number of software tools to predict secreted proteins. Six software programs capable of high throughput and employing a wide range of prediction methods, SignalP 3.0, SignalP 2.0, TargetP 1.01, PrediSi, Phobius, and ProtComp 6.0, are evaluated. Results: Prediction accuracies were evaluated using 372 unbiased, eukaryotic, SwissProt protein sequences. TargetP, SignalP 3.0 maximum S-score and SignalP 3.0 D-score were the most accurate single scores (90-91% accurate). The combination of a positive TargetP prediction, SignalP 2.0 maximum Y-score, and SignalP 3.0 maximum S-score increased accuracy by six percent. Conclusions: Single predictive scores could be highly accurate, but almost all accuracies were slightly less than those reported by program authors. Predictive accuracy could be substantially improved by combining scores from multiple methods into a single composite prediction.

UR - http://www.scopus.com/inward/record.url?scp=27644444255&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=27644444255&partnerID=8YFLogxK

U2 - 10.1186/1471-2105-6-256

DO - 10.1186/1471-2105-6-256

M3 - Article

C2 - 16225690

AN - SCOPUS:27644444255

VL - 6

JO - BMC Bioinformatics

JF - BMC Bioinformatics

SN - 1471-2105

M1 - 256

ER -