Using distributional analysis to semantically classify UMLS concepts.

Jung Wei Fan, Hua Xu, Carol Friedman

Research output: Contribution to journalArticle

Abstract

The UMLS is a widely used and comprehensive knowledge source in the biomedical domain. It specifies biomedical concepts and their semantic categories, and therefore is valuable for Natural Language Processing (NLP) and other knowledge-based systems. However, the UMLS semantic classification is not always accurate, which adversely affects performance of these systems. Therefore, it is desirable to automatically validate, or, when necessary, to semantically reclassify UMLS concepts. We applied a distributional similarity method based on syntactic dependencies and -skew divergence to classify concepts in the T033 Finding class in order to determine which ones were biologic functions or disorders. A gold standard of 100 randomly sampled concepts was created that was based on a majority annotation of three experts. Precision of 0.54 and recall of 0.654 was achieved by the top prediction; precision of 0.64 and recall of 0.769 was achieved by the top 2 predictions. Error analysis revealed problems in the current method, and provided insight into future improvements.

Original languageEnglish (US)
Pages (from-to)519-523
Number of pages5
JournalMedinfo. MEDINFO
Volume12
Issue numberPt 1
StatePublished - Dec 1 2007
Externally publishedYes

Fingerprint

Unified Medical Language System
Semantics
Natural Language Processing

ASJC Scopus subject areas

  • Medicine(all)

Cite this

Using distributional analysis to semantically classify UMLS concepts. / Fan, Jung Wei; Xu, Hua; Friedman, Carol.

In: Medinfo. MEDINFO, Vol. 12, No. Pt 1, 01.12.2007, p. 519-523.

Research output: Contribution to journalArticle

Fan, JW, Xu, H & Friedman, C 2007, 'Using distributional analysis to semantically classify UMLS concepts.', Medinfo. MEDINFO, vol. 12, no. Pt 1, pp. 519-523.
Fan, Jung Wei ; Xu, Hua ; Friedman, Carol. / Using distributional analysis to semantically classify UMLS concepts. In: Medinfo. MEDINFO. 2007 ; Vol. 12, No. Pt 1. pp. 519-523.
@article{1e58ce3118004c9ea21b12987891d76d,
title = "Using distributional analysis to semantically classify UMLS concepts.",
abstract = "The UMLS is a widely used and comprehensive knowledge source in the biomedical domain. It specifies biomedical concepts and their semantic categories, and therefore is valuable for Natural Language Processing (NLP) and other knowledge-based systems. However, the UMLS semantic classification is not always accurate, which adversely affects performance of these systems. Therefore, it is desirable to automatically validate, or, when necessary, to semantically reclassify UMLS concepts. We applied a distributional similarity method based on syntactic dependencies and -skew divergence to classify concepts in the T033 Finding class in order to determine which ones were biologic functions or disorders. A gold standard of 100 randomly sampled concepts was created that was based on a majority annotation of three experts. Precision of 0.54 and recall of 0.654 was achieved by the top prediction; precision of 0.64 and recall of 0.769 was achieved by the top 2 predictions. Error analysis revealed problems in the current method, and provided insight into future improvements.",
author = "Fan, {Jung Wei} and Hua Xu and Carol Friedman",
year = "2007",
month = "12",
day = "1",
language = "English (US)",
volume = "12",
pages = "519--523",
journal = "Medinfo. MEDINFO",
number = "Pt 1",

}

TY - JOUR

T1 - Using distributional analysis to semantically classify UMLS concepts.

AU - Fan, Jung Wei

AU - Xu, Hua

AU - Friedman, Carol

PY - 2007/12/1

Y1 - 2007/12/1

N2 - The UMLS is a widely used and comprehensive knowledge source in the biomedical domain. It specifies biomedical concepts and their semantic categories, and therefore is valuable for Natural Language Processing (NLP) and other knowledge-based systems. However, the UMLS semantic classification is not always accurate, which adversely affects performance of these systems. Therefore, it is desirable to automatically validate, or, when necessary, to semantically reclassify UMLS concepts. We applied a distributional similarity method based on syntactic dependencies and -skew divergence to classify concepts in the T033 Finding class in order to determine which ones were biologic functions or disorders. A gold standard of 100 randomly sampled concepts was created that was based on a majority annotation of three experts. Precision of 0.54 and recall of 0.654 was achieved by the top prediction; precision of 0.64 and recall of 0.769 was achieved by the top 2 predictions. Error analysis revealed problems in the current method, and provided insight into future improvements.

AB - The UMLS is a widely used and comprehensive knowledge source in the biomedical domain. It specifies biomedical concepts and their semantic categories, and therefore is valuable for Natural Language Processing (NLP) and other knowledge-based systems. However, the UMLS semantic classification is not always accurate, which adversely affects performance of these systems. Therefore, it is desirable to automatically validate, or, when necessary, to semantically reclassify UMLS concepts. We applied a distributional similarity method based on syntactic dependencies and -skew divergence to classify concepts in the T033 Finding class in order to determine which ones were biologic functions or disorders. A gold standard of 100 randomly sampled concepts was created that was based on a majority annotation of three experts. Precision of 0.54 and recall of 0.654 was achieved by the top prediction; precision of 0.64 and recall of 0.769 was achieved by the top 2 predictions. Error analysis revealed problems in the current method, and provided insight into future improvements.

UR - http://www.scopus.com/inward/record.url?scp=38449098208&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=38449098208&partnerID=8YFLogxK

M3 - Article

VL - 12

SP - 519

EP - 523

JO - Medinfo. MEDINFO

JF - Medinfo. MEDINFO

IS - Pt 1

ER -