TY - JOUR
T1 - A vocabulary development and visualization tool based on natural language processing and the mining of textual patient reports
AU - Friedman, Carol
AU - Liu, Hongfang
AU - Shagina, Lyudmila
N1 - Funding Information:
This work was supported in part by Grant LM06274 from the NLM and from the Center for Advanced Technology at Columbia University.
PY - 2003/6
Y1 - 2003/6
N2 - Medical terminologies are critical for automated healthcare systems. Some terminologies, such as the UMLS and SNOMED are comprehensive, whereas others specialize in limited domains (i.e., BIRADS) or are developed for specific applications. An important feature of a terminology is comprehensive coverage of relevant clinical terms and ease of use by users, which include computerized applications. We have developed a method for facilitating vocabulary development and maintenance that is based on utilization of natural language processing to mine large collections of clinical reports in order to obtain information on terminology as expressed by physicians. Once the reports are processed and the terms structured and collected into an XML representational schema, it is possible to determine information about terms, such as frequency of occurrence, compositionality, relations to other terms (such as modifiers), and correspondence to a controlled vocabulary. This paper describes the method and discusses how it can be used as a tool to help vocabulary builders navigate through the terms physicians use, visualize their relations to other terms via a flexible viewer, and determine their correspondence to a controlled vocabulary.
AB - Medical terminologies are critical for automated healthcare systems. Some terminologies, such as the UMLS and SNOMED are comprehensive, whereas others specialize in limited domains (i.e., BIRADS) or are developed for specific applications. An important feature of a terminology is comprehensive coverage of relevant clinical terms and ease of use by users, which include computerized applications. We have developed a method for facilitating vocabulary development and maintenance that is based on utilization of natural language processing to mine large collections of clinical reports in order to obtain information on terminology as expressed by physicians. Once the reports are processed and the terms structured and collected into an XML representational schema, it is possible to determine information about terms, such as frequency of occurrence, compositionality, relations to other terms (such as modifiers), and correspondence to a controlled vocabulary. This paper describes the method and discusses how it can be used as a tool to help vocabulary builders navigate through the terms physicians use, visualize their relations to other terms via a flexible viewer, and determine their correspondence to a controlled vocabulary.
KW - Controlled vocabulary
KW - Medical terminology
KW - Natural language processing
KW - Text mining
KW - XML-based graphical user interface
UR - http://www.scopus.com/inward/record.url?scp=0242690445&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=0242690445&partnerID=8YFLogxK
U2 - 10.1016/j.jbi.2003.08.005
DO - 10.1016/j.jbi.2003.08.005
M3 - Article
C2 - 14615228
AN - SCOPUS:0242690445
SN - 1532-0464
VL - 36
SP - 189
EP - 201
JO - Journal of Biomedical Informatics
JF - Journal of Biomedical Informatics
IS - 3
ER -