Fusion of knowledge-intensive and statistical approaches for retrieving and annotating textual genomics documents

Alan R. Aronson; Dina Demner-Fushman; Susanne M. Humphrey; Jimmy Lin; Hongfang Liu; Patrick Ruch; Miguel E. Ruiz; Lawrence H. Smith; Lorraine K. Tanabe; W. John Wilbur

Fusion of knowledge-intensive and statistical approaches for retrieving and annotating textual genomics documents

Alan R. Aronson, Dina Demner-Fushman, Susanne M. Humphrey, Jimmy Lin, Hongfang Liu, Patrick Ruch, Miguel E. Ruiz, Lawrence H. Smith, Lorraine K. Tanabe, W. John Wilbur

Digital Health Sciences

Research output: Contribution to journal › Conference article › peer-review

2 Scopus citations

Abstract

This paper represents a continuation of research into the retrieval and annotation of textual genomics documents (both MEDLINE® citations and full text articles) for the purpose of satisfying biologists' real information needs. The overall approach taken here for both the ad hoc retrieval and categorization tasks within the TREC genomics track in 2005 was one combining the results of several NLP, statistical and ML methods, using a fusion method for ad hoc retrieval and ensemble methods for categorization. The results show that fusion approaches can improve the final outcome for the ad hoc and the categorization tasks, but that care must be taken in order to take advantage of the strengths of the constituent methods.

Original language	English (US)
Journal	NIST Special Publication
State	Published - 2005
Event	14th Text REtrieval Conference, TREC 2005 - Gaithersburg, MD, United States Duration: Nov 15 2005 → Nov 18 2005

Keywords

Genomics
Information retrieval
MEDLINE/pubmed
Machine learning
Mesh
Statistical natural language processing
Thematic analysis
Vector space models

ASJC Scopus subject areas

General Engineering

Cite this

@article{48e70b3ec16149c6a3fcc740c44e4d3a,

title = "Fusion of knowledge-intensive and statistical approaches for retrieving and annotating textual genomics documents",

abstract = "This paper represents a continuation of research into the retrieval and annotation of textual genomics documents (both MEDLINE{\textregistered} citations and full text articles) for the purpose of satisfying biologists' real information needs. The overall approach taken here for both the ad hoc retrieval and categorization tasks within the TREC genomics track in 2005 was one combining the results of several NLP, statistical and ML methods, using a fusion method for ad hoc retrieval and ensemble methods for categorization. The results show that fusion approaches can improve the final outcome for the ad hoc and the categorization tasks, but that care must be taken in order to take advantage of the strengths of the constituent methods.",

keywords = "Genomics, Information retrieval, MEDLINE/pubmed, Machine learning, Mesh, Statistical natural language processing, Thematic analysis, Vector space models",

author = "Aronson, {Alan R.} and Dina Demner-Fushman and Humphrey, {Susanne M.} and Jimmy Lin and Hongfang Liu and Patrick Ruch and Ruiz, {Miguel E.} and Smith, {Lawrence H.} and Tanabe, {Lorraine K.} and {John Wilbur}, W.",

year = "2005",

language = "English (US)",

journal = "NIST Special Publication",

issn = "1048-776X",

publisher = "Institute of Electrical and Electronics Engineers Inc.",

note = "14th Text REtrieval Conference, TREC 2005 ; Conference date: 15-11-2005 Through 18-11-2005",

}

TY - JOUR

T1 - Fusion of knowledge-intensive and statistical approaches for retrieving and annotating textual genomics documents

AU - Aronson, Alan R.

AU - Demner-Fushman, Dina

AU - Humphrey, Susanne M.

AU - Lin, Jimmy

AU - Liu, Hongfang

AU - Ruch, Patrick

AU - Ruiz, Miguel E.

AU - Smith, Lawrence H.

AU - Tanabe, Lorraine K.

AU - John Wilbur, W.

PY - 2005

Y1 - 2005

N2 - This paper represents a continuation of research into the retrieval and annotation of textual genomics documents (both MEDLINE® citations and full text articles) for the purpose of satisfying biologists' real information needs. The overall approach taken here for both the ad hoc retrieval and categorization tasks within the TREC genomics track in 2005 was one combining the results of several NLP, statistical and ML methods, using a fusion method for ad hoc retrieval and ensemble methods for categorization. The results show that fusion approaches can improve the final outcome for the ad hoc and the categorization tasks, but that care must be taken in order to take advantage of the strengths of the constituent methods.

AB - This paper represents a continuation of research into the retrieval and annotation of textual genomics documents (both MEDLINE® citations and full text articles) for the purpose of satisfying biologists' real information needs. The overall approach taken here for both the ad hoc retrieval and categorization tasks within the TREC genomics track in 2005 was one combining the results of several NLP, statistical and ML methods, using a fusion method for ad hoc retrieval and ensemble methods for categorization. The results show that fusion approaches can improve the final outcome for the ad hoc and the categorization tasks, but that care must be taken in order to take advantage of the strengths of the constituent methods.

KW - Genomics

KW - Information retrieval

KW - MEDLINE/pubmed

KW - Machine learning

KW - Mesh

KW - Statistical natural language processing

KW - Thematic analysis

KW - Vector space models

UR - http://www.scopus.com/inward/record.url?scp=84873562502&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=84873562502&partnerID=8YFLogxK

M3 - Conference article

AN - SCOPUS:84873562502

SN - 1048-776X

JO - NIST Special Publication

JF - NIST Special Publication

T2 - 14th Text REtrieval Conference, TREC 2005

Y2 - 15 November 2005 through 18 November 2005

ER -

Fusion of knowledge-intensive and statistical approaches for retrieving and annotating textual genomics documents

Abstract

Keywords

ASJC Scopus subject areas

Other files and links

Fingerprint

Cite this