Classifier ensemble for biomedical document retrieval

Manabu Torii, Hongfang D Liu

Research output: Chapter in Book/Report/Conference proceedingConference contribution

2 Citations (Scopus)

Abstract

Background: Due to rich information embedded in published articles, literature review has become an important aspect of research activities in the biomedical domain. Machine Learning (ML) techniques have been explored to retrieve relevant articles from a large literature archive (i.e., classifying articles into relevant and irrelevant classes), and to accelerating the literature review process. Meanwhile, an ensemble classifier, a system that assigns classes based on the outputs of multiple classifiers, tends to be more robust and has better performance than each individual classifier. Ensemble classifiers are often composed of classifiers trained on different training sets (e.g., sampled data sets) or of those using different ML algorithms. In this paper, we propose a simple ensemble approach where an ensemble is composed of classifiers using different feature sets for an ML algorithm. We evaluated the approach using Support Vector Machine (SVM) on two publicly available collections of MEDLINE citations, the Post-translational modification (PTM) data sets and the Immune Epitope Database (IEDB) data sets, that resulted from biomedical database curation projects. Results: The evaluation showed that ensemble classifiers outperformed their constituent classifiers as measured by both area under ROC curve (AUC) and precision/recall break-even-point (BEP), provided with enough training data. We observed that the performance of SVM ensembles were competitive or better than the best results previously reported for the data sets used. Conclusions: The proposed ensemble approach was found to be effective in improving performance of SVM classifiers. The approach is also simple and easy-to-deploy in document classification/retrieval tasks. However, improvement of classifiers through the current approach is still modest. We plan to explore different ways to derive and combine constituent classifiers, and continue our investigation over other data sets.

Original languageEnglish (US)
Title of host publicationCEUR Workshop Proceedings
Volume319
StatePublished - 2007
Externally publishedYes
Event2nd International Symposium on Languages in Biology and Medicine, LBM 2007 - Singapore, Singapore
Duration: Dec 6 2007Dec 7 2007

Other

Other2nd International Symposium on Languages in Biology and Medicine, LBM 2007
CountrySingapore
CitySingapore
Period12/6/0712/7/07

Fingerprint

Classifiers
Support vector machines
Learning systems
Learning algorithms
Epitopes

ASJC Scopus subject areas

  • Computer Science(all)

Cite this

Torii, M., & Liu, H. D. (2007). Classifier ensemble for biomedical document retrieval. In CEUR Workshop Proceedings (Vol. 319)

Classifier ensemble for biomedical document retrieval. / Torii, Manabu; Liu, Hongfang D.

CEUR Workshop Proceedings. Vol. 319 2007.

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Torii, M & Liu, HD 2007, Classifier ensemble for biomedical document retrieval. in CEUR Workshop Proceedings. vol. 319, 2nd International Symposium on Languages in Biology and Medicine, LBM 2007, Singapore, Singapore, 12/6/07.
Torii M, Liu HD. Classifier ensemble for biomedical document retrieval. In CEUR Workshop Proceedings. Vol. 319. 2007
Torii, Manabu ; Liu, Hongfang D. / Classifier ensemble for biomedical document retrieval. CEUR Workshop Proceedings. Vol. 319 2007.
@inproceedings{624cfac6344c458191137e0b6152764f,
title = "Classifier ensemble for biomedical document retrieval",
abstract = "Background: Due to rich information embedded in published articles, literature review has become an important aspect of research activities in the biomedical domain. Machine Learning (ML) techniques have been explored to retrieve relevant articles from a large literature archive (i.e., classifying articles into relevant and irrelevant classes), and to accelerating the literature review process. Meanwhile, an ensemble classifier, a system that assigns classes based on the outputs of multiple classifiers, tends to be more robust and has better performance than each individual classifier. Ensemble classifiers are often composed of classifiers trained on different training sets (e.g., sampled data sets) or of those using different ML algorithms. In this paper, we propose a simple ensemble approach where an ensemble is composed of classifiers using different feature sets for an ML algorithm. We evaluated the approach using Support Vector Machine (SVM) on two publicly available collections of MEDLINE citations, the Post-translational modification (PTM) data sets and the Immune Epitope Database (IEDB) data sets, that resulted from biomedical database curation projects. Results: The evaluation showed that ensemble classifiers outperformed their constituent classifiers as measured by both area under ROC curve (AUC) and precision/recall break-even-point (BEP), provided with enough training data. We observed that the performance of SVM ensembles were competitive or better than the best results previously reported for the data sets used. Conclusions: The proposed ensemble approach was found to be effective in improving performance of SVM classifiers. The approach is also simple and easy-to-deploy in document classification/retrieval tasks. However, improvement of classifiers through the current approach is still modest. We plan to explore different ways to derive and combine constituent classifiers, and continue our investigation over other data sets.",
author = "Manabu Torii and Liu, {Hongfang D}",
year = "2007",
language = "English (US)",
volume = "319",
booktitle = "CEUR Workshop Proceedings",

}

TY - GEN

T1 - Classifier ensemble for biomedical document retrieval

AU - Torii, Manabu

AU - Liu, Hongfang D

PY - 2007

Y1 - 2007

N2 - Background: Due to rich information embedded in published articles, literature review has become an important aspect of research activities in the biomedical domain. Machine Learning (ML) techniques have been explored to retrieve relevant articles from a large literature archive (i.e., classifying articles into relevant and irrelevant classes), and to accelerating the literature review process. Meanwhile, an ensemble classifier, a system that assigns classes based on the outputs of multiple classifiers, tends to be more robust and has better performance than each individual classifier. Ensemble classifiers are often composed of classifiers trained on different training sets (e.g., sampled data sets) or of those using different ML algorithms. In this paper, we propose a simple ensemble approach where an ensemble is composed of classifiers using different feature sets for an ML algorithm. We evaluated the approach using Support Vector Machine (SVM) on two publicly available collections of MEDLINE citations, the Post-translational modification (PTM) data sets and the Immune Epitope Database (IEDB) data sets, that resulted from biomedical database curation projects. Results: The evaluation showed that ensemble classifiers outperformed their constituent classifiers as measured by both area under ROC curve (AUC) and precision/recall break-even-point (BEP), provided with enough training data. We observed that the performance of SVM ensembles were competitive or better than the best results previously reported for the data sets used. Conclusions: The proposed ensemble approach was found to be effective in improving performance of SVM classifiers. The approach is also simple and easy-to-deploy in document classification/retrieval tasks. However, improvement of classifiers through the current approach is still modest. We plan to explore different ways to derive and combine constituent classifiers, and continue our investigation over other data sets.

AB - Background: Due to rich information embedded in published articles, literature review has become an important aspect of research activities in the biomedical domain. Machine Learning (ML) techniques have been explored to retrieve relevant articles from a large literature archive (i.e., classifying articles into relevant and irrelevant classes), and to accelerating the literature review process. Meanwhile, an ensemble classifier, a system that assigns classes based on the outputs of multiple classifiers, tends to be more robust and has better performance than each individual classifier. Ensemble classifiers are often composed of classifiers trained on different training sets (e.g., sampled data sets) or of those using different ML algorithms. In this paper, we propose a simple ensemble approach where an ensemble is composed of classifiers using different feature sets for an ML algorithm. We evaluated the approach using Support Vector Machine (SVM) on two publicly available collections of MEDLINE citations, the Post-translational modification (PTM) data sets and the Immune Epitope Database (IEDB) data sets, that resulted from biomedical database curation projects. Results: The evaluation showed that ensemble classifiers outperformed their constituent classifiers as measured by both area under ROC curve (AUC) and precision/recall break-even-point (BEP), provided with enough training data. We observed that the performance of SVM ensembles were competitive or better than the best results previously reported for the data sets used. Conclusions: The proposed ensemble approach was found to be effective in improving performance of SVM classifiers. The approach is also simple and easy-to-deploy in document classification/retrieval tasks. However, improvement of classifiers through the current approach is still modest. We plan to explore different ways to derive and combine constituent classifiers, and continue our investigation over other data sets.

UR - http://www.scopus.com/inward/record.url?scp=84879904513&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=84879904513&partnerID=8YFLogxK

M3 - Conference contribution

AN - SCOPUS:84879904513

VL - 319

BT - CEUR Workshop Proceedings

ER -