Dynamically generating a protein entity dictionary using online resources

Hongfang Liu, Zhangzhi Hu, Cathy Wu

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Abstract

With the overwhelming amount of biologica knowledge stored in free text, natural language processing (NLP) has received much attention recently to make the task of managing information recorded in free text more feasible. One requirement for most NLP systems is the ability to accurately recognize biological entity terms in free text and the ability to map these terms to corresponding records in databases. Such task is called biological named entity tagging. In this paper, we present a system that automatically constructs a protein entity dictionary, which contains gene or protein names associated with UniProt identifiers using online resources. The system can run periodically to always keep up-to-date with these online resources. Using online resources that were available on Dec. 25, 2004, we obtained 4,046,733 terms for 1,640,082 entities. The dictionary can be accessed from the following website: http://biocreative.ifsm.umbc.edu/biothesaurus/.

Original languageEnglish (US)
Title of host publicationACL-05 - 43rd Annual Meeting of the Association for Computational Linguistics, Proceedings of the Conference
PublisherAssociation for Computational Linguistics (ACL)
Pages17-20
Number of pages4
ISBN (Print)1932432515, 9781932432510
DOIs
StatePublished - 2005
Event43rd Annual Meeting of the Association for Computational Linguistics, ACL-05 - Ann Arbor, MI, United States
Duration: Jun 25 2005Jun 30 2005

Publication series

NameACL-05 - 43rd Annual Meeting of the Association for Computational Linguistics, Proceedings of the Conference

Other

Other43rd Annual Meeting of the Association for Computational Linguistics, ACL-05
CountryUnited States
CityAnn Arbor, MI
Period6/25/056/30/05

ASJC Scopus subject areas

  • Language and Linguistics
  • Linguistics and Language

Fingerprint Dive into the research topics of 'Dynamically generating a protein entity dictionary using online resources'. Together they form a unique fingerprint.

Cite this