Comparison of classification methods on protein-protein interaction document classification

Guixian Xu; Zhendong Niu; Peter Uetz; Xu Gao; Hongfang Liu

doi:10.1109/BIBMW.2008.4686213

Comparison of classification methods on protein-protein interaction document classification

Guixian Xu, Zhendong Niu, Peter Uetz, Xu Gao, Hongfang Liu

Digital Health Sciences

Research output: Chapter in Book/Report/Conference proceeding › Conference contribution

1 Scopus citations

Abstract

Protein-protein interaction (PPI) network is essential to understand the fundamental processes governing cell biology. The mining and curation of experimental PPI knowledge is critical for analysis of high-throughput genomics and proteomics data. Several PPI knowledge bases have been generated by expensive manual curation but far from comprehensive. Document classification systems have been shown to have the potential to accelerate the curation process by retrieving PPI-related documents. However, it is usually a case that a small number of positive documents can be obtained manually or from PPI knowledge bases with literature-based evidence and there are a large number of unlabeled documents where most of them are negative documents. Such data sets are called imbalanced. Learning from imbalanced data sets, where the number of examples of one (majority) class is much higher than the others, presents an important challenge to the machine learning community. It is not clear what kind of classification algorithm is suitable for PPI document classification. In this paper, we compared the performance of several document classifiers on two PPI document sets and varied the size of the number of positives and the ratio of the number of positives to the number of negatives (or unlabeled) in the experiment.

Original language	English (US)
Title of host publication	Proceedings - 2008 IEEE International Conference on Bioinformatics and Biomedicine Workshops, BIBMW
Pages	83-90
Number of pages	8
DOIs	https://doi.org/10.1109/BIBMW.2008.4686213
State	Published - 2008
Event	2008 IEEE International Conference on Bioinformatics and Biomedicine Workshops, BIBMW - Philadelphia, PA, United States Duration: Nov 3 2008 → Nov 5 2008

Publication series

Name	Proceedings - 2008 IEEE International Conference on Bioinformatics and Biomedicine Workshops, BIBMW

Other

Other	2008 IEEE International Conference on Bioinformatics and Biomedicine Workshops, BIBMW
Country/Territory	United States
City	Philadelphia, PA
Period	11/3/08 → 11/5/08

ASJC Scopus subject areas

Molecular Biology
Information Systems
Biomedical Engineering

Access to Document

10.1109/BIBMW.2008.4686213

Cite this

Xu, G., Niu, Z., Uetz, P., Gao, X., & Liu, H. (2008). Comparison of classification methods on protein-protein interaction document classification. In Proceedings - 2008 IEEE International Conference on Bioinformatics and Biomedicine Workshops, BIBMW (pp. 83-90). Article 4686213 (Proceedings - 2008 IEEE International Conference on Bioinformatics and Biomedicine Workshops, BIBMW). https://doi.org/10.1109/BIBMW.2008.4686213

Comparison of classification methods on protein-protein interaction document classification. / Xu, Guixian; Niu, Zhendong; Uetz, Peter et al.
Proceedings - 2008 IEEE International Conference on Bioinformatics and Biomedicine Workshops, BIBMW. 2008. p. 83-90 4686213 (Proceedings - 2008 IEEE International Conference on Bioinformatics and Biomedicine Workshops, BIBMW).

Research output: Chapter in Book/Report/Conference proceeding › Conference contribution

Xu, G, Niu, Z, Uetz, P, Gao, X & Liu, H 2008, Comparison of classification methods on protein-protein interaction document classification. in Proceedings - 2008 IEEE International Conference on Bioinformatics and Biomedicine Workshops, BIBMW., 4686213, Proceedings - 2008 IEEE International Conference on Bioinformatics and Biomedicine Workshops, BIBMW, pp. 83-90, 2008 IEEE International Conference on Bioinformatics and Biomedicine Workshops, BIBMW, Philadelphia, PA, United States, 11/3/08. https://doi.org/10.1109/BIBMW.2008.4686213

Xu G, Niu Z, Uetz P, Gao X, Liu H. Comparison of classification methods on protein-protein interaction document classification. In Proceedings - 2008 IEEE International Conference on Bioinformatics and Biomedicine Workshops, BIBMW. 2008. p. 83-90. 4686213. (Proceedings - 2008 IEEE International Conference on Bioinformatics and Biomedicine Workshops, BIBMW). doi: 10.1109/BIBMW.2008.4686213

@inproceedings{c39464f35ae244cd82289232498fd448,

title = "Comparison of classification methods on protein-protein interaction document classification",

abstract = "Protein-protein interaction (PPI) network is essential to understand the fundamental processes governing cell biology. The mining and curation of experimental PPI knowledge is critical for analysis of high-throughput genomics and proteomics data. Several PPI knowledge bases have been generated by expensive manual curation but far from comprehensive. Document classification systems have been shown to have the potential to accelerate the curation process by retrieving PPI-related documents. However, it is usually a case that a small number of positive documents can be obtained manually or from PPI knowledge bases with literature-based evidence and there are a large number of unlabeled documents where most of them are negative documents. Such data sets are called imbalanced. Learning from imbalanced data sets, where the number of examples of one (majority) class is much higher than the others, presents an important challenge to the machine learning community. It is not clear what kind of classification algorithm is suitable for PPI document classification. In this paper, we compared the performance of several document classifiers on two PPI document sets and varied the size of the number of positives and the ratio of the number of positives to the number of negatives (or unlabeled) in the experiment.",

author = "Guixian Xu and Zhendong Niu and Peter Uetz and Xu Gao and Hongfang Liu",

year = "2008",

doi = "10.1109/BIBMW.2008.4686213",

language = "English (US)",

isbn = "9781424428908",

series = "Proceedings - 2008 IEEE International Conference on Bioinformatics and Biomedicine Workshops, BIBMW",

pages = "83--90",

booktitle = "Proceedings - 2008 IEEE International Conference on Bioinformatics and Biomedicine Workshops, BIBMW",

note = "2008 IEEE International Conference on Bioinformatics and Biomedicine Workshops, BIBMW ; Conference date: 03-11-2008 Through 05-11-2008",

}

TY - GEN

T1 - Comparison of classification methods on protein-protein interaction document classification

AU - Xu, Guixian

AU - Niu, Zhendong

AU - Uetz, Peter

AU - Gao, Xu

AU - Liu, Hongfang

PY - 2008

Y1 - 2008

N2 - Protein-protein interaction (PPI) network is essential to understand the fundamental processes governing cell biology. The mining and curation of experimental PPI knowledge is critical for analysis of high-throughput genomics and proteomics data. Several PPI knowledge bases have been generated by expensive manual curation but far from comprehensive. Document classification systems have been shown to have the potential to accelerate the curation process by retrieving PPI-related documents. However, it is usually a case that a small number of positive documents can be obtained manually or from PPI knowledge bases with literature-based evidence and there are a large number of unlabeled documents where most of them are negative documents. Such data sets are called imbalanced. Learning from imbalanced data sets, where the number of examples of one (majority) class is much higher than the others, presents an important challenge to the machine learning community. It is not clear what kind of classification algorithm is suitable for PPI document classification. In this paper, we compared the performance of several document classifiers on two PPI document sets and varied the size of the number of positives and the ratio of the number of positives to the number of negatives (or unlabeled) in the experiment.

AB - Protein-protein interaction (PPI) network is essential to understand the fundamental processes governing cell biology. The mining and curation of experimental PPI knowledge is critical for analysis of high-throughput genomics and proteomics data. Several PPI knowledge bases have been generated by expensive manual curation but far from comprehensive. Document classification systems have been shown to have the potential to accelerate the curation process by retrieving PPI-related documents. However, it is usually a case that a small number of positive documents can be obtained manually or from PPI knowledge bases with literature-based evidence and there are a large number of unlabeled documents where most of them are negative documents. Such data sets are called imbalanced. Learning from imbalanced data sets, where the number of examples of one (majority) class is much higher than the others, presents an important challenge to the machine learning community. It is not clear what kind of classification algorithm is suitable for PPI document classification. In this paper, we compared the performance of several document classifiers on two PPI document sets and varied the size of the number of positives and the ratio of the number of positives to the number of negatives (or unlabeled) in the experiment.

UR - http://www.scopus.com/inward/record.url?scp=58049170453&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=58049170453&partnerID=8YFLogxK

U2 - 10.1109/BIBMW.2008.4686213

DO - 10.1109/BIBMW.2008.4686213

M3 - Conference contribution

AN - SCOPUS:58049170453

SN - 9781424428908

T3 - Proceedings - 2008 IEEE International Conference on Bioinformatics and Biomedicine Workshops, BIBMW

SP - 83

EP - 90

BT - Proceedings - 2008 IEEE International Conference on Bioinformatics and Biomedicine Workshops, BIBMW

T2 - 2008 IEEE International Conference on Bioinformatics and Biomedicine Workshops, BIBMW

Y2 - 3 November 2008 through 5 November 2008

ER -

Comparison of classification methods on protein-protein interaction document classification

Abstract

Publication series

Other

ASJC Scopus subject areas

Access to Document

Other files and links

Fingerprint

Cite this