TY - GEN
T1 - Dynamically generating a protein entity dictionary using online resources
AU - Liu, Hongfang
AU - Hu, Zhangzhi
AU - Wu, Cathy
PY - 2005
Y1 - 2005
N2 - With the overwhelming amount of biologica knowledge stored in free text, natural language processing (NLP) has received much attention recently to make the task of managing information recorded in free text more feasible. One requirement for most NLP systems is the ability to accurately recognize biological entity terms in free text and the ability to map these terms to corresponding records in databases. Such task is called biological named entity tagging. In this paper, we present a system that automatically constructs a protein entity dictionary, which contains gene or protein names associated with UniProt identifiers using online resources. The system can run periodically to always keep up-to-date with these online resources. Using online resources that were available on Dec. 25, 2004, we obtained 4,046,733 terms for 1,640,082 entities. The dictionary can be accessed from the following website: http://biocreative.ifsm.umbc.edu/biothesaurus/.
AB - With the overwhelming amount of biologica knowledge stored in free text, natural language processing (NLP) has received much attention recently to make the task of managing information recorded in free text more feasible. One requirement for most NLP systems is the ability to accurately recognize biological entity terms in free text and the ability to map these terms to corresponding records in databases. Such task is called biological named entity tagging. In this paper, we present a system that automatically constructs a protein entity dictionary, which contains gene or protein names associated with UniProt identifiers using online resources. The system can run periodically to always keep up-to-date with these online resources. Using online resources that were available on Dec. 25, 2004, we obtained 4,046,733 terms for 1,640,082 entities. The dictionary can be accessed from the following website: http://biocreative.ifsm.umbc.edu/biothesaurus/.
UR - http://www.scopus.com/inward/record.url?scp=84859895200&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=84859895200&partnerID=8YFLogxK
U2 - 10.3115/1225753.1225758
DO - 10.3115/1225753.1225758
M3 - Conference contribution
AN - SCOPUS:84859895200
SN - 1932432515
SN - 9781932432510
T3 - ACL-05 - 43rd Annual Meeting of the Association for Computational Linguistics, Proceedings of the Conference
SP - 17
EP - 20
BT - ACL-05 - 43rd Annual Meeting of the Association for Computational Linguistics, Proceedings of the Conference
PB - Association for Computational Linguistics (ACL)
T2 - 43rd Annual Meeting of the Association for Computational Linguistics, ACL-05
Y2 - 25 June 2005 through 30 June 2005
ER -