Using distributional analysis to semantically classify UMLS concepts

Jung Wei Fan, Hua Xu, Carol Friedman

Research output: Chapter in Book/Report/Conference proceedingConference contribution

1 Citation (Scopus)

Abstract

The UMLS is a widely used and comprehensive knowledge source in the biomedical domain. It specifies biomedical concepts and their semantic categories, and therefore is valuable for Natural Language Processing (NLP) and other knowledge-based systems. However, the UMLS semantic classification is not always accurate, which adversely affects performance of these systems. Therefore, it is desirable to automatically validate, or, when necessary, to semantically reclassify UMLS concepts. We applied a distributional similarity method based on syntactic dependencies and -skew divergence to classify concepts in the T033 Finding class in order to determine which ones were biologic functions or disorders. A gold standard of 100 randomly sampled concepts was created that was based on a majority annotation of three experts. Precision of 0.54 and recall of 0.654 was achieved by the top prediction; precision of 0.64 and recall of 0.769 was achieved by the top 2 predictions. Error analysis revealed problems in the current method, and provided insight into future improvements.

Original languageEnglish (US)
Title of host publicationMEDINFO 2007 - Proceedings of the 12th World Congress on Health (Medical) Informatics
Subtitle of host publicationBuilding Sustainable Health Systems
Pages519-523
Number of pages5
Volume129
StatePublished - Dec 1 2007
Externally publishedYes
Event12th World Congress on Medical Informatics, MEDINFO 2007 - Brisbane, QLD, Australia
Duration: Aug 20 2007Aug 24 2007

Other

Other12th World Congress on Medical Informatics, MEDINFO 2007
CountryAustralia
CityBrisbane, QLD
Period8/20/078/24/07

Fingerprint

Unified Medical Language System
Semantics
Knowledge based systems
Syntactics
Natural Language Processing
Error analysis
Processing

Keywords

  • distributional similarity
  • natural language processing
  • semantic classification
  • UMLS

ASJC Scopus subject areas

  • Biomedical Engineering
  • Health Informatics
  • Health Information Management

Cite this

Fan, J. W., Xu, H., & Friedman, C. (2007). Using distributional analysis to semantically classify UMLS concepts. In MEDINFO 2007 - Proceedings of the 12th World Congress on Health (Medical) Informatics: Building Sustainable Health Systems (Vol. 129, pp. 519-523)

Using distributional analysis to semantically classify UMLS concepts. / Fan, Jung Wei; Xu, Hua; Friedman, Carol.

MEDINFO 2007 - Proceedings of the 12th World Congress on Health (Medical) Informatics: Building Sustainable Health Systems. Vol. 129 2007. p. 519-523.

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Fan, JW, Xu, H & Friedman, C 2007, Using distributional analysis to semantically classify UMLS concepts. in MEDINFO 2007 - Proceedings of the 12th World Congress on Health (Medical) Informatics: Building Sustainable Health Systems. vol. 129, pp. 519-523, 12th World Congress on Medical Informatics, MEDINFO 2007, Brisbane, QLD, Australia, 8/20/07.
Fan JW, Xu H, Friedman C. Using distributional analysis to semantically classify UMLS concepts. In MEDINFO 2007 - Proceedings of the 12th World Congress on Health (Medical) Informatics: Building Sustainable Health Systems. Vol. 129. 2007. p. 519-523
Fan, Jung Wei ; Xu, Hua ; Friedman, Carol. / Using distributional analysis to semantically classify UMLS concepts. MEDINFO 2007 - Proceedings of the 12th World Congress on Health (Medical) Informatics: Building Sustainable Health Systems. Vol. 129 2007. pp. 519-523
@inproceedings{766bb26deae546ed8586db91fb8ec7d7,
title = "Using distributional analysis to semantically classify UMLS concepts",
abstract = "The UMLS is a widely used and comprehensive knowledge source in the biomedical domain. It specifies biomedical concepts and their semantic categories, and therefore is valuable for Natural Language Processing (NLP) and other knowledge-based systems. However, the UMLS semantic classification is not always accurate, which adversely affects performance of these systems. Therefore, it is desirable to automatically validate, or, when necessary, to semantically reclassify UMLS concepts. We applied a distributional similarity method based on syntactic dependencies and -skew divergence to classify concepts in the T033 Finding class in order to determine which ones were biologic functions or disorders. A gold standard of 100 randomly sampled concepts was created that was based on a majority annotation of three experts. Precision of 0.54 and recall of 0.654 was achieved by the top prediction; precision of 0.64 and recall of 0.769 was achieved by the top 2 predictions. Error analysis revealed problems in the current method, and provided insight into future improvements.",
keywords = "distributional similarity, natural language processing, semantic classification, UMLS",
author = "Fan, {Jung Wei} and Hua Xu and Carol Friedman",
year = "2007",
month = "12",
day = "1",
language = "English (US)",
isbn = "9781586037741",
volume = "129",
pages = "519--523",
booktitle = "MEDINFO 2007 - Proceedings of the 12th World Congress on Health (Medical) Informatics",

}

TY - GEN

T1 - Using distributional analysis to semantically classify UMLS concepts

AU - Fan, Jung Wei

AU - Xu, Hua

AU - Friedman, Carol

PY - 2007/12/1

Y1 - 2007/12/1

N2 - The UMLS is a widely used and comprehensive knowledge source in the biomedical domain. It specifies biomedical concepts and their semantic categories, and therefore is valuable for Natural Language Processing (NLP) and other knowledge-based systems. However, the UMLS semantic classification is not always accurate, which adversely affects performance of these systems. Therefore, it is desirable to automatically validate, or, when necessary, to semantically reclassify UMLS concepts. We applied a distributional similarity method based on syntactic dependencies and -skew divergence to classify concepts in the T033 Finding class in order to determine which ones were biologic functions or disorders. A gold standard of 100 randomly sampled concepts was created that was based on a majority annotation of three experts. Precision of 0.54 and recall of 0.654 was achieved by the top prediction; precision of 0.64 and recall of 0.769 was achieved by the top 2 predictions. Error analysis revealed problems in the current method, and provided insight into future improvements.

AB - The UMLS is a widely used and comprehensive knowledge source in the biomedical domain. It specifies biomedical concepts and their semantic categories, and therefore is valuable for Natural Language Processing (NLP) and other knowledge-based systems. However, the UMLS semantic classification is not always accurate, which adversely affects performance of these systems. Therefore, it is desirable to automatically validate, or, when necessary, to semantically reclassify UMLS concepts. We applied a distributional similarity method based on syntactic dependencies and -skew divergence to classify concepts in the T033 Finding class in order to determine which ones were biologic functions or disorders. A gold standard of 100 randomly sampled concepts was created that was based on a majority annotation of three experts. Precision of 0.54 and recall of 0.654 was achieved by the top prediction; precision of 0.64 and recall of 0.769 was achieved by the top 2 predictions. Error analysis revealed problems in the current method, and provided insight into future improvements.

KW - distributional similarity

KW - natural language processing

KW - semantic classification

KW - UMLS

UR - http://www.scopus.com/inward/record.url?scp=84887979250&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=84887979250&partnerID=8YFLogxK

M3 - Conference contribution

AN - SCOPUS:84887979250

SN - 9781586037741

VL - 129

SP - 519

EP - 523

BT - MEDINFO 2007 - Proceedings of the 12th World Congress on Health (Medical) Informatics

ER -