Combining multiple evidence for gene symbol disambiguation

Hua Xu, Jung Wei Fan, Carol Friedman

Research output: Contribution to conferencePaper

6 Citations (Scopus)

Abstract

Gene names and symbols are important biomedical entities, but are highly ambiguous. This ambiguity affects the performance of both information extraction and information retrieval systems in the biomedical domain. Existing knowledge sources contain different types of information about genes and could be used to disambiguate gene symbols. In this paper, we applied an information retrieval (IR) based method for human gene symbol disambiguation and studied different methods to combine various types of information from available knowledge sources. Results showed that a combination of evidence usually improved performance. The combination method using coefficients obtained from a logistic regression model reached the highest precision of 92.2% on a testing set of ambiguous human gene symbols.

Original languageEnglish (US)
Pages41-48
Number of pages8
StatePublished - Jan 1 2007
Externally publishedYes
EventACL 2007 Workshop on Biological, Translational, and Clinical Language Processing, BioNLP 2007 - Prague, Czech Republic
Duration: Jun 29 2007 → …

Other

OtherACL 2007 Workshop on Biological, Translational, and Clinical Language Processing, BioNLP 2007
CountryCzech Republic
CityPrague
Period6/29/07 → …

Fingerprint

Genes
Information Storage and Retrieval
Logistic Models
Information retrieval systems
Information retrieval
Information Systems
Names
Logistics
Disambiguation
Gene
Symbol
Testing
Information Retrieval

ASJC Scopus subject areas

  • Language and Linguistics
  • Information Systems
  • Software
  • Health Informatics
  • Computer Science Applications
  • Biomedical Engineering

Cite this

Xu, H., Fan, J. W., & Friedman, C. (2007). Combining multiple evidence for gene symbol disambiguation. 41-48. Paper presented at ACL 2007 Workshop on Biological, Translational, and Clinical Language Processing, BioNLP 2007, Prague, Czech Republic.

Combining multiple evidence for gene symbol disambiguation. / Xu, Hua; Fan, Jung Wei; Friedman, Carol.

2007. 41-48 Paper presented at ACL 2007 Workshop on Biological, Translational, and Clinical Language Processing, BioNLP 2007, Prague, Czech Republic.

Research output: Contribution to conferencePaper

Xu, H, Fan, JW & Friedman, C 2007, 'Combining multiple evidence for gene symbol disambiguation' Paper presented at ACL 2007 Workshop on Biological, Translational, and Clinical Language Processing, BioNLP 2007, Prague, Czech Republic, 6/29/07, pp. 41-48.
Xu H, Fan JW, Friedman C. Combining multiple evidence for gene symbol disambiguation. 2007. Paper presented at ACL 2007 Workshop on Biological, Translational, and Clinical Language Processing, BioNLP 2007, Prague, Czech Republic.
Xu, Hua ; Fan, Jung Wei ; Friedman, Carol. / Combining multiple evidence for gene symbol disambiguation. Paper presented at ACL 2007 Workshop on Biological, Translational, and Clinical Language Processing, BioNLP 2007, Prague, Czech Republic.8 p.
@conference{2d7598b14cd54c44891ff6b1ef69b75e,
title = "Combining multiple evidence for gene symbol disambiguation",
abstract = "Gene names and symbols are important biomedical entities, but are highly ambiguous. This ambiguity affects the performance of both information extraction and information retrieval systems in the biomedical domain. Existing knowledge sources contain different types of information about genes and could be used to disambiguate gene symbols. In this paper, we applied an information retrieval (IR) based method for human gene symbol disambiguation and studied different methods to combine various types of information from available knowledge sources. Results showed that a combination of evidence usually improved performance. The combination method using coefficients obtained from a logistic regression model reached the highest precision of 92.2{\%} on a testing set of ambiguous human gene symbols.",
author = "Hua Xu and Fan, {Jung Wei} and Carol Friedman",
year = "2007",
month = "1",
day = "1",
language = "English (US)",
pages = "41--48",
note = "ACL 2007 Workshop on Biological, Translational, and Clinical Language Processing, BioNLP 2007 ; Conference date: 29-06-2007",

}

TY - CONF

T1 - Combining multiple evidence for gene symbol disambiguation

AU - Xu, Hua

AU - Fan, Jung Wei

AU - Friedman, Carol

PY - 2007/1/1

Y1 - 2007/1/1

N2 - Gene names and symbols are important biomedical entities, but are highly ambiguous. This ambiguity affects the performance of both information extraction and information retrieval systems in the biomedical domain. Existing knowledge sources contain different types of information about genes and could be used to disambiguate gene symbols. In this paper, we applied an information retrieval (IR) based method for human gene symbol disambiguation and studied different methods to combine various types of information from available knowledge sources. Results showed that a combination of evidence usually improved performance. The combination method using coefficients obtained from a logistic regression model reached the highest precision of 92.2% on a testing set of ambiguous human gene symbols.

AB - Gene names and symbols are important biomedical entities, but are highly ambiguous. This ambiguity affects the performance of both information extraction and information retrieval systems in the biomedical domain. Existing knowledge sources contain different types of information about genes and could be used to disambiguate gene symbols. In this paper, we applied an information retrieval (IR) based method for human gene symbol disambiguation and studied different methods to combine various types of information from available knowledge sources. Results showed that a combination of evidence usually improved performance. The combination method using coefficients obtained from a logistic regression model reached the highest precision of 92.2% on a testing set of ambiguous human gene symbols.

UR - http://www.scopus.com/inward/record.url?scp=51049110217&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=51049110217&partnerID=8YFLogxK

M3 - Paper

SP - 41

EP - 48

ER -