Dynamically generating a protein entity dictionary using online resources

Hongfang Liu, Zhangzhi Hu, Cathy Wu

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Abstract

With the overwhelming amount of biologica knowledge stored in free text, natural language processing (NLP) has received much attention recently to make the task of managing information recorded in free text more feasible. One requirement for most NLP systems is the ability to accurately recognize biological entity terms in free text and the ability to map these terms to corresponding records in databases. Such task is called biological named entity tagging. In this paper, we present a system that automatically constructs a protein entity dictionary, which contains gene or protein names associated with UniProt identifiers using online resources. The system can run periodically to always keep up-to-date with these online resources. Using online resources that were available on Dec. 25, 2004, we obtained 4,046,733 terms for 1,640,082 entities. The dictionary can be accessed from the following website: http://biocreative.ifsm.umbc.edu/biothesaurus/.

Original languageEnglish (US)
Title of host publicationACL-05 - 43rd Annual Meeting of the Association for Computational Linguistics, Proceedings of the Conference
Pages17-20
Number of pages4
StatePublished - Dec 1 2005
Event43rd Annual Meeting of the Association for Computational Linguistics, ACL-05 - Ann Arbor, MI, United States
Duration: Jun 25 2005Jun 30 2005

Publication series

NameACL-05 - 43rd Annual Meeting of the Association for Computational Linguistics, Proceedings of the Conference

Other

Other43rd Annual Meeting of the Association for Computational Linguistics, ACL-05
CountryUnited States
CityAnn Arbor, MI
Period6/25/056/30/05

    Fingerprint

ASJC Scopus subject areas

  • Language and Linguistics
  • Linguistics and Language

Cite this

Liu, H., Hu, Z., & Wu, C. (2005). Dynamically generating a protein entity dictionary using online resources. In ACL-05 - 43rd Annual Meeting of the Association for Computational Linguistics, Proceedings of the Conference (pp. 17-20). (ACL-05 - 43rd Annual Meeting of the Association for Computational Linguistics, Proceedings of the Conference).