Combining multiple evidence for gene symbol disambiguation

Hua Xu, Jung Wei Fan, Carol Friedman

Research output: Contribution to conferencePaper

6 Scopus citations

Abstract

Gene names and symbols are important biomedical entities, but are highly ambiguous. This ambiguity affects the performance of both information extraction and information retrieval systems in the biomedical domain. Existing knowledge sources contain different types of information about genes and could be used to disambiguate gene symbols. In this paper, we applied an information retrieval (IR) based method for human gene symbol disambiguation and studied different methods to combine various types of information from available knowledge sources. Results showed that a combination of evidence usually improved performance. The combination method using coefficients obtained from a logistic regression model reached the highest precision of 92.2% on a testing set of ambiguous human gene symbols.

Original languageEnglish (US)
Pages41-48
Number of pages8
DOIs
StatePublished - 2007
EventACL 2007 Workshop on Biological, Translational, and Clinical Language Processing, BioNLP 2007 - Prague, Czech Republic
Duration: Jun 29 2007 → …

Other

OtherACL 2007 Workshop on Biological, Translational, and Clinical Language Processing, BioNLP 2007
CountryCzech Republic
CityPrague
Period6/29/07 → …

ASJC Scopus subject areas

  • Language and Linguistics
  • Information Systems
  • Software
  • Health Informatics
  • Computer Science Applications
  • Biomedical Engineering

Fingerprint Dive into the research topics of 'Combining multiple evidence for gene symbol disambiguation'. Together they form a unique fingerprint.

  • Cite this

    Xu, H., Fan, J. W., & Friedman, C. (2007). Combining multiple evidence for gene symbol disambiguation. 41-48. Paper presented at ACL 2007 Workshop on Biological, Translational, and Clinical Language Processing, BioNLP 2007, Prague, Czech Republic. https://doi.org/10.3115/1572392.1572400