Learning from positive and unlabeled documents for retrieval of bacterial protein-protein interaction literature

Hongfang Liu; Manabu Torii; Guixian Xu; Zhangzhi Hu; Johannes Goll

doi:10.1007/978-3-642-13131-8_8

Learning from positive and unlabeled documents for retrieval of bacterial protein-protein interaction literature

Hongfang Liu, Manabu Torii, Guixian Xu, Zhangzhi Hu, Johannes Goll

Digital Health Sciences

Research output: Chapter in Book/Report/Conference proceeding › Conference contribution

4 Scopus citations

Abstract

With the advance of high-throughput genomics and proteomics technologies, it becomes critical to mine and curate protein-protein interaction (PPI) networks from biological research literature. Several PPI knowledge bases have been curated by domain experts but they are far from comprehensive. Observing that PPI-relevant documents can be obtained from PPI knowledge bases recording literature evidences and also that a large number of unlabeled documents (mostly negative) are freely available, we investigated learning from positive and unlabeled data (LPU) and developed an automated system for the retrieval of PPI-relevant articles aiming at assisting the curation of a bacterial PPI knowledge base, MPIDB. Two different approaches of obtaining unlabeled documents were used: one based on PubMed MeSH term search and the other based on an existing knowledge base, UniProtKB. We found unlabeled documents obtained from UniProtKB tend to yield better document classifiers for PPI curation purposes. Our study shows that LPU is a possible scenario for the development of an automated system to retrieve PPI-relevant articles, where there is no requirement for extra annotation effort. Selection of machine learning algorithms and that of unlabeled documents would be critical in constructing an effective LPU-based system.

Original language	English (US)
Title of host publication	Linking Literature, Information, and Knowledge for Biology - Workshop of the BioLink Special Interest Group, ISMB/ECCB 2009, Revised Selected Papers
Pages	62-70
Number of pages	9
DOIs	https://doi.org/10.1007/978-3-642-13131-8_8
State	Published - 2010
Event	Workshop of the BioLINK Special Interest Group on Linking Literature, Information and Knowledge for Biology, ISMB/ECCB 2009 - Stockholm, Sweden Duration: Jun 28 2009 → Jun 29 2009

Publication series

Name	Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
Volume	6004 LNBI
ISSN (Print)	0302-9743
ISSN (Electronic)	1611-3349

Other

Other	Workshop of the BioLINK Special Interest Group on Linking Literature, Information and Knowledge for Biology, ISMB/ECCB 2009
Country/Territory	Sweden
City	Stockholm
Period	6/28/09 → 6/29/09

Keywords

Document retrieval
Learning from positive and unlabeled
Protein-protein interaction

ASJC Scopus subject areas

Theoretical Computer Science
General Computer Science

Access to Document

10.1007/978-3-642-13131-8_8

Cite this

Liu, H., Torii, M., Xu, G., Hu, Z., & Goll, J. (2010). Learning from positive and unlabeled documents for retrieval of bacterial protein-protein interaction literature. In Linking Literature, Information, and Knowledge for Biology - Workshop of the BioLink Special Interest Group, ISMB/ECCB 2009, Revised Selected Papers (pp. 62-70). (Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics); Vol. 6004 LNBI). https://doi.org/10.1007/978-3-642-13131-8_8

Learning from positive and unlabeled documents for retrieval of bacterial protein-protein interaction literature. / Liu, Hongfang; Torii, Manabu; Xu, Guixian et al.
Linking Literature, Information, and Knowledge for Biology - Workshop of the BioLink Special Interest Group, ISMB/ECCB 2009, Revised Selected Papers. 2010. p. 62-70 (Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics); Vol. 6004 LNBI).

Research output: Chapter in Book/Report/Conference proceeding › Conference contribution

Liu, H, Torii, M, Xu, G, Hu, Z & Goll, J 2010, Learning from positive and unlabeled documents for retrieval of bacterial protein-protein interaction literature. in Linking Literature, Information, and Knowledge for Biology - Workshop of the BioLink Special Interest Group, ISMB/ECCB 2009, Revised Selected Papers. Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics), vol. 6004 LNBI, pp. 62-70, Workshop of the BioLINK Special Interest Group on Linking Literature, Information and Knowledge for Biology, ISMB/ECCB 2009, Stockholm, Sweden, 6/28/09. https://doi.org/10.1007/978-3-642-13131-8_8

Liu H, Torii M, Xu G, Hu Z, Goll J. Learning from positive and unlabeled documents for retrieval of bacterial protein-protein interaction literature. In Linking Literature, Information, and Knowledge for Biology - Workshop of the BioLink Special Interest Group, ISMB/ECCB 2009, Revised Selected Papers. 2010. p. 62-70. (Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)). doi: 10.1007/978-3-642-13131-8_8

Liu, Hongfang ; Torii, Manabu ; Xu, Guixian et al. / Learning from positive and unlabeled documents for retrieval of bacterial protein-protein interaction literature. Linking Literature, Information, and Knowledge for Biology - Workshop of the BioLink Special Interest Group, ISMB/ECCB 2009, Revised Selected Papers. 2010. pp. 62-70 (Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)).

@inproceedings{fc1b05630ad848bbb1fc06e6dd31d6e3,

title = "Learning from positive and unlabeled documents for retrieval of bacterial protein-protein interaction literature",

abstract = "With the advance of high-throughput genomics and proteomics technologies, it becomes critical to mine and curate protein-protein interaction (PPI) networks from biological research literature. Several PPI knowledge bases have been curated by domain experts but they are far from comprehensive. Observing that PPI-relevant documents can be obtained from PPI knowledge bases recording literature evidences and also that a large number of unlabeled documents (mostly negative) are freely available, we investigated learning from positive and unlabeled data (LPU) and developed an automated system for the retrieval of PPI-relevant articles aiming at assisting the curation of a bacterial PPI knowledge base, MPIDB. Two different approaches of obtaining unlabeled documents were used: one based on PubMed MeSH term search and the other based on an existing knowledge base, UniProtKB. We found unlabeled documents obtained from UniProtKB tend to yield better document classifiers for PPI curation purposes. Our study shows that LPU is a possible scenario for the development of an automated system to retrieve PPI-relevant articles, where there is no requirement for extra annotation effort. Selection of machine learning algorithms and that of unlabeled documents would be critical in constructing an effective LPU-based system.",

keywords = "Document retrieval, Learning from positive and unlabeled, Protein-protein interaction",

author = "Hongfang Liu and Manabu Torii and Guixian Xu and Zhangzhi Hu and Johannes Goll",

year = "2010",

doi = "10.1007/978-3-642-13131-8_8",

language = "English (US)",

isbn = "3642131301",

series = "Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)",

pages = "62--70",

booktitle = "Linking Literature, Information, and Knowledge for Biology - Workshop of the BioLink Special Interest Group, ISMB/ECCB 2009, Revised Selected Papers",

note = "Workshop of the BioLINK Special Interest Group on Linking Literature, Information and Knowledge for Biology, ISMB/ECCB 2009 ; Conference date: 28-06-2009 Through 29-06-2009",

}

TY - GEN

T1 - Learning from positive and unlabeled documents for retrieval of bacterial protein-protein interaction literature

AU - Liu, Hongfang

AU - Torii, Manabu

AU - Xu, Guixian

AU - Hu, Zhangzhi

AU - Goll, Johannes

PY - 2010

Y1 - 2010

N2 - With the advance of high-throughput genomics and proteomics technologies, it becomes critical to mine and curate protein-protein interaction (PPI) networks from biological research literature. Several PPI knowledge bases have been curated by domain experts but they are far from comprehensive. Observing that PPI-relevant documents can be obtained from PPI knowledge bases recording literature evidences and also that a large number of unlabeled documents (mostly negative) are freely available, we investigated learning from positive and unlabeled data (LPU) and developed an automated system for the retrieval of PPI-relevant articles aiming at assisting the curation of a bacterial PPI knowledge base, MPIDB. Two different approaches of obtaining unlabeled documents were used: one based on PubMed MeSH term search and the other based on an existing knowledge base, UniProtKB. We found unlabeled documents obtained from UniProtKB tend to yield better document classifiers for PPI curation purposes. Our study shows that LPU is a possible scenario for the development of an automated system to retrieve PPI-relevant articles, where there is no requirement for extra annotation effort. Selection of machine learning algorithms and that of unlabeled documents would be critical in constructing an effective LPU-based system.

AB - With the advance of high-throughput genomics and proteomics technologies, it becomes critical to mine and curate protein-protein interaction (PPI) networks from biological research literature. Several PPI knowledge bases have been curated by domain experts but they are far from comprehensive. Observing that PPI-relevant documents can be obtained from PPI knowledge bases recording literature evidences and also that a large number of unlabeled documents (mostly negative) are freely available, we investigated learning from positive and unlabeled data (LPU) and developed an automated system for the retrieval of PPI-relevant articles aiming at assisting the curation of a bacterial PPI knowledge base, MPIDB. Two different approaches of obtaining unlabeled documents were used: one based on PubMed MeSH term search and the other based on an existing knowledge base, UniProtKB. We found unlabeled documents obtained from UniProtKB tend to yield better document classifiers for PPI curation purposes. Our study shows that LPU is a possible scenario for the development of an automated system to retrieve PPI-relevant articles, where there is no requirement for extra annotation effort. Selection of machine learning algorithms and that of unlabeled documents would be critical in constructing an effective LPU-based system.

KW - Document retrieval

KW - Learning from positive and unlabeled

KW - Protein-protein interaction

UR - http://www.scopus.com/inward/record.url?scp=77953743348&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=77953743348&partnerID=8YFLogxK

U2 - 10.1007/978-3-642-13131-8_8

DO - 10.1007/978-3-642-13131-8_8

M3 - Conference contribution

AN - SCOPUS:77953743348

SN - 3642131301

SN - 9783642131301

T3 - Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)

SP - 62

EP - 70

BT - Linking Literature, Information, and Knowledge for Biology - Workshop of the BioLink Special Interest Group, ISMB/ECCB 2009, Revised Selected Papers

T2 - Workshop of the BioLINK Special Interest Group on Linking Literature, Information and Knowledge for Biology, ISMB/ECCB 2009

Y2 - 28 June 2009 through 29 June 2009

ER -

Learning from positive and unlabeled documents for retrieval of bacterial protein-protein interaction literature

Abstract

Publication series

Other

Keywords

ASJC Scopus subject areas

Access to Document

Other files and links

Fingerprint

Cite this