Classifier ensemble for biomedical document retrieval

Manabu Torii, Hongfang Liu

Research output: Contribution to journalConference article

2 Scopus citations

Abstract

Background: Due to rich information embedded in published articles, literature review has become an important aspect of research activities in the biomedical domain. Machine Learning (ML) techniques have been explored to retrieve relevant articles from a large literature archive (i.e., classifying articles into relevant and irrelevant classes), and to accelerating the literature review process. Meanwhile, an ensemble classifier, a system that assigns classes based on the outputs of multiple classifiers, tends to be more robust and has better performance than each individual classifier. Ensemble classifiers are often composed of classifiers trained on different training sets (e.g., sampled data sets) or of those using different ML algorithms. In this paper, we propose a simple ensemble approach where an ensemble is composed of classifiers using different feature sets for an ML algorithm. We evaluated the approach using Support Vector Machine (SVM) on two publicly available collections of MEDLINE citations, the Post-translational modification (PTM) data sets and the Immune Epitope Database (IEDB) data sets, that resulted from biomedical database curation projects. Results: The evaluation showed that ensemble classifiers outperformed their constituent classifiers as measured by both area under ROC curve (AUC) and precision/recall break-even-point (BEP), provided with enough training data. We observed that the performance of SVM ensembles were competitive or better than the best results previously reported for the data sets used. Conclusions: The proposed ensemble approach was found to be effective in improving performance of SVM classifiers. The approach is also simple and easy-to-deploy in document classification/retrieval tasks. However, improvement of classifiers through the current approach is still modest. We plan to explore different ways to derive and combine constituent classifiers, and continue our investigation over other data sets.

Original languageEnglish (US)
Pages (from-to)5.1-5.17
JournalCEUR Workshop Proceedings
Volume319
StatePublished - Dec 1 2007
Event2nd International Symposium on Languages in Biology and Medicine, LBM 2007 - Singapore, Singapore
Duration: Dec 6 2007Dec 7 2007

    Fingerprint

ASJC Scopus subject areas

  • Computer Science(all)

Cite this