Gene symbol disambiguation using knowledge-based profiles

Hua Xu, Jung Wei Fan, George Hripcsak, Eneida A. Mendonça, Marianthi Markatou, Carol Friedman

Research output: Contribution to journalArticlepeer-review

37 Scopus citations


Motivation: The ambiguity of biomedical entities, particularly of gene symbols, is a big challenge for text-mining systems in the biomedical domain. Existing knowledge sources, such as Entrez Gene and the MEDLINE database, contain information concerning the characteristics of a particular gene that could be used to disambiguate gene symbols. Results: For each gene, we create a profile with different types of information automatically extracted from related MEDLINE abstracts and readily available annotated knowledge sources. We apply the gene profiles to the disambiguation task via an information retrieval method, which ranks the similarity scores between the context where the ambiguous gene is mentioned, and candidate gene profiles. The gene profile with the highest similarity score is then chosen as the correct sense. We evaluated the method on three automatically generated testing sets of mouse, fly and yeast organisms, respectively. The method achieved the highest precision of 93.9% for the mouse, 77.8% for the fly and 89.5% for the yeast.

Original languageEnglish (US)
Pages (from-to)1015-1022
Number of pages8
Issue number8
StatePublished - Apr 15 2007

ASJC Scopus subject areas

  • Statistics and Probability
  • Biochemistry
  • Molecular Biology
  • Computer Science Applications
  • Computational Theory and Mathematics
  • Computational Mathematics


Dive into the research topics of 'Gene symbol disambiguation using knowledge-based profiles'. Together they form a unique fingerprint.

Cite this