PredictSNP: Robust and Accurate Consensus Classifier for Prediction of Disease-Related Mutations

Jaroslav Bendl; Jan Stourac; Ondrej Salanda; Antonin Pavelka; Eric D. Wieben; Jaroslav Zendulka; Jan Brezovsky; Jiri Damborsky

doi:10.1371/journal.pcbi.1003440

PredictSNP: Robust and Accurate Consensus Classifier for Prediction of Disease-Related Mutations

Jaroslav Bendl, Jan Stourac, Ondrej Salanda, Antonin Pavelka, Eric D. Wieben, Jaroslav Zendulka, Jan Brezovsky, Jiri Damborsky

Biochemistry and Molecular Biology

Research output: Contribution to journal › Article › peer-review

322 Scopus citations

Abstract

Single nucleotide variants represent a prevalent form of genetic variation. Mutations in the coding regions are frequently associated with the development of various genetic diseases. Computational tools for the prediction of the effects of mutations on protein function are very important for analysis of single nucleotide variants and their prioritization for experimental characterization. Many computational tools are already widely employed for this purpose. Unfortunately, their comparison and further improvement is hindered by large overlaps between the training datasets and benchmark datasets, which lead to biased and overly optimistic reported performances. In this study, we have constructed three independent datasets by removing all duplicities, inconsistencies and mutations previously used in the training of evaluated tools. The benchmark dataset containing over 43,000 mutations was employed for the unbiased evaluation of eight established prediction tools: MAPP, nsSNPAnalyzer, PANTHER, PhD-SNP, PolyPhen-1, PolyPhen-2, SIFT and SNAP. The six best performing tools were combined into a consensus classifier PredictSNP, resulting into significantly improved prediction performance, and at the same time returned results for all mutations, confirming that consensus prediction represents an accurate and robust alternative to the predictions delivered by individual tools. A user-friendly web interface enables easy access to all eight prediction tools, the consensus classifier PredictSNP and annotations from the Protein Mutant Database and the UniProt database. The web server and the datasets are freely available to the academic community at http://loschmidt.chemi.muni.cz/predictsnp.

Original language	English (US)
Article number	e1003440
Journal	PLoS computational biology
Volume	10
Issue number	1
DOIs	https://doi.org/10.1371/journal.pcbi.1003440
State	Published - Jan 2014

ASJC Scopus subject areas

Ecology, Evolution, Behavior and Systematics
Modeling and Simulation
Ecology
Molecular Biology
Genetics
Cellular and Molecular Neuroscience
Computational Theory and Mathematics

Access to Document

10.1371/journal.pcbi.1003440

Cite this

@article{0280b345308a4cfca5f4a62b4548dff5,

title = "PredictSNP: Robust and Accurate Consensus Classifier for Prediction of Disease-Related Mutations",

abstract = "Single nucleotide variants represent a prevalent form of genetic variation. Mutations in the coding regions are frequently associated with the development of various genetic diseases. Computational tools for the prediction of the effects of mutations on protein function are very important for analysis of single nucleotide variants and their prioritization for experimental characterization. Many computational tools are already widely employed for this purpose. Unfortunately, their comparison and further improvement is hindered by large overlaps between the training datasets and benchmark datasets, which lead to biased and overly optimistic reported performances. In this study, we have constructed three independent datasets by removing all duplicities, inconsistencies and mutations previously used in the training of evaluated tools. The benchmark dataset containing over 43,000 mutations was employed for the unbiased evaluation of eight established prediction tools: MAPP, nsSNPAnalyzer, PANTHER, PhD-SNP, PolyPhen-1, PolyPhen-2, SIFT and SNAP. The six best performing tools were combined into a consensus classifier PredictSNP, resulting into significantly improved prediction performance, and at the same time returned results for all mutations, confirming that consensus prediction represents an accurate and robust alternative to the predictions delivered by individual tools. A user-friendly web interface enables easy access to all eight prediction tools, the consensus classifier PredictSNP and annotations from the Protein Mutant Database and the UniProt database. The web server and the datasets are freely available to the academic community at http://loschmidt.chemi.muni.cz/predictsnp.",

author = "Jaroslav Bendl and Jan Stourac and Ondrej Salanda and Antonin Pavelka and Wieben, {Eric D.} and Jaroslav Zendulka and Jan Brezovsky and Jiri Damborsky",

year = "2014",

month = jan,

doi = "10.1371/journal.pcbi.1003440",

language = "English (US)",

volume = "10",

journal = "PLoS computational biology",

issn = "1553-734X",

publisher = "Public Library of Science",

number = "1",

}

TY - JOUR

T1 - PredictSNP

T2 - Robust and Accurate Consensus Classifier for Prediction of Disease-Related Mutations

AU - Bendl, Jaroslav

AU - Stourac, Jan

AU - Salanda, Ondrej

AU - Pavelka, Antonin

AU - Wieben, Eric D.

AU - Zendulka, Jaroslav

AU - Brezovsky, Jan

AU - Damborsky, Jiri

PY - 2014/1

Y1 - 2014/1

N2 - Single nucleotide variants represent a prevalent form of genetic variation. Mutations in the coding regions are frequently associated with the development of various genetic diseases. Computational tools for the prediction of the effects of mutations on protein function are very important for analysis of single nucleotide variants and their prioritization for experimental characterization. Many computational tools are already widely employed for this purpose. Unfortunately, their comparison and further improvement is hindered by large overlaps between the training datasets and benchmark datasets, which lead to biased and overly optimistic reported performances. In this study, we have constructed three independent datasets by removing all duplicities, inconsistencies and mutations previously used in the training of evaluated tools. The benchmark dataset containing over 43,000 mutations was employed for the unbiased evaluation of eight established prediction tools: MAPP, nsSNPAnalyzer, PANTHER, PhD-SNP, PolyPhen-1, PolyPhen-2, SIFT and SNAP. The six best performing tools were combined into a consensus classifier PredictSNP, resulting into significantly improved prediction performance, and at the same time returned results for all mutations, confirming that consensus prediction represents an accurate and robust alternative to the predictions delivered by individual tools. A user-friendly web interface enables easy access to all eight prediction tools, the consensus classifier PredictSNP and annotations from the Protein Mutant Database and the UniProt database. The web server and the datasets are freely available to the academic community at http://loschmidt.chemi.muni.cz/predictsnp.

AB - Single nucleotide variants represent a prevalent form of genetic variation. Mutations in the coding regions are frequently associated with the development of various genetic diseases. Computational tools for the prediction of the effects of mutations on protein function are very important for analysis of single nucleotide variants and their prioritization for experimental characterization. Many computational tools are already widely employed for this purpose. Unfortunately, their comparison and further improvement is hindered by large overlaps between the training datasets and benchmark datasets, which lead to biased and overly optimistic reported performances. In this study, we have constructed three independent datasets by removing all duplicities, inconsistencies and mutations previously used in the training of evaluated tools. The benchmark dataset containing over 43,000 mutations was employed for the unbiased evaluation of eight established prediction tools: MAPP, nsSNPAnalyzer, PANTHER, PhD-SNP, PolyPhen-1, PolyPhen-2, SIFT and SNAP. The six best performing tools were combined into a consensus classifier PredictSNP, resulting into significantly improved prediction performance, and at the same time returned results for all mutations, confirming that consensus prediction represents an accurate and robust alternative to the predictions delivered by individual tools. A user-friendly web interface enables easy access to all eight prediction tools, the consensus classifier PredictSNP and annotations from the Protein Mutant Database and the UniProt database. The web server and the datasets are freely available to the academic community at http://loschmidt.chemi.muni.cz/predictsnp.

UR - http://www.scopus.com/inward/record.url?scp=84896698938&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=84896698938&partnerID=8YFLogxK

U2 - 10.1371/journal.pcbi.1003440

DO - 10.1371/journal.pcbi.1003440

M3 - Article

C2 - 24453961

AN - SCOPUS:84896698938

SN - 1553-734X

VL - 10

JO - PLoS computational biology

JF - PLoS computational biology

IS - 1

M1 - e1003440

ER -

PredictSNP: Robust and Accurate Consensus Classifier for Prediction of Disease-Related Mutations

Abstract

ASJC Scopus subject areas

Access to Document

Other files and links

Fingerprint

Cite this