Comparison of classification methods on protein-protein interaction document classification

Guixian Xu, Zhendong Niu, Peter Uetz, Xu Gao, Hongfang Liu

Research output: Chapter in Book/Report/Conference proceedingConference contribution

1 Scopus citations

Abstract

Protein-protein interaction (PPI) network is essential to understand the fundamental processes governing cell biology. The mining and curation of experimental PPI knowledge is critical for analysis of high-throughput genomics and proteomics data. Several PPI knowledge bases have been generated by expensive manual curation but far from comprehensive. Document classification systems have been shown to have the potential to accelerate the curation process by retrieving PPI-related documents. However, it is usually a case that a small number of positive documents can be obtained manually or from PPI knowledge bases with literature-based evidence and there are a large number of unlabeled documents where most of them are negative documents. Such data sets are called imbalanced. Learning from imbalanced data sets, where the number of examples of one (majority) class is much higher than the others, presents an important challenge to the machine learning community. It is not clear what kind of classification algorithm is suitable for PPI document classification. In this paper, we compared the performance of several document classifiers on two PPI document sets and varied the size of the number of positives and the ratio of the number of positives to the number of negatives (or unlabeled) in the experiment.

Original languageEnglish (US)
Title of host publicationProceedings - 2008 IEEE International Conference on Bioinformatics and Biomedicine Workshops, BIBMW
Pages83-90
Number of pages8
DOIs
StatePublished - 2008
Event2008 IEEE International Conference on Bioinformatics and Biomedicine Workshops, BIBMW - Philadelphia, PA, United States
Duration: Nov 3 2008Nov 5 2008

Publication series

NameProceedings - 2008 IEEE International Conference on Bioinformatics and Biomedicine Workshops, BIBMW

Other

Other2008 IEEE International Conference on Bioinformatics and Biomedicine Workshops, BIBMW
Country/TerritoryUnited States
CityPhiladelphia, PA
Period11/3/0811/5/08

ASJC Scopus subject areas

  • Molecular Biology
  • Information Systems
  • Biomedical Engineering

Fingerprint

Dive into the research topics of 'Comparison of classification methods on protein-protein interaction document classification'. Together they form a unique fingerprint.

Cite this